Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
en:it:inside_sotkanet [2017-10-17 05:34]
Heikki Siltala
en:it:inside_sotkanet [2017-10-17 05:43] (current)
Heikki Siltala
Line 3: Line 3:
 <wrap info>​This article was written in 2005.</​wrap>​ <wrap info>​This article was written in 2005.</​wrap>​
  
-{{ sotkanet:​sotkanet.gif}}+{{ images:sotkanet:​sotkanet.gif}}
  
 SOTKAnet indicator bank is a public accessible web information service that publishes detailed information on well-being and health in Finland. SOTKAnet was launched by STAKES, the National Research and Development Centre for Welfare and Health, on Monday 14th March 2005 at [[http://​www.sotkanet.fi/​|www.sotkanet.fi]]. SOTKAnet is currently completely free for all users and every bit of the service is equally available in three languages (Finnish, Swedish and English). ​ SOTKAnet indicator bank is a public accessible web information service that publishes detailed information on well-being and health in Finland. SOTKAnet was launched by STAKES, the National Research and Development Centre for Welfare and Health, on Monday 14th March 2005 at [[http://​www.sotkanet.fi/​|www.sotkanet.fi]]. SOTKAnet is currently completely free for all users and every bit of the service is equally available in three languages (Finnish, Swedish and English). ​
Line 44: Line 44:
 SOTKAnet'​s data warehouse runs on Oracle RDBMS (currently on version 9i) and is designed so that it can be accessed with any relational database aware tool like SQL command interface, Web Services API, Oracle Discoverer, Oracle Reports, ODBC and JDBC connectivity and of course by the custom-built SOTKAnet web interface. Oracle'​s SQL*Plus command line interface is an example of a primitive way to get into the data while the SOTKAnet web interface at www.sotkanet.fi is probably the most sophisticated. ​ SOTKAnet'​s data warehouse runs on Oracle RDBMS (currently on version 9i) and is designed so that it can be accessed with any relational database aware tool like SQL command interface, Web Services API, Oracle Discoverer, Oracle Reports, ODBC and JDBC connectivity and of course by the custom-built SOTKAnet web interface. Oracle'​s SQL*Plus command line interface is an example of a primitive way to get into the data while the SOTKAnet web interface at www.sotkanet.fi is probably the most sophisticated. ​
  
-{{ image:​sotkanet:​sotkanet_datamodel.gif?​240}}+{{ images:​sotkanet:​sotkanet_datamodel.gif?​240}}
  
 One design goal of SOTKAnet data warehouse was to provide fastest possible query response time. Update and insert efficiency wasn't that important. As far as know it is a fact that the fastest way to provide calculated values to user is to calculate them beforehand and at the query time just fetch the pre-calculated results. We stick to this rule and ended up storing all the pre-calculated indicator values on different dimension levels in a single fact table. So requesting one indicator value from SOTKAnet is technically just fetching one row from the fact table and requesting 500 indicator values from SOTKAnet is to fetch 500 rows from the fact table. ​ One design goal of SOTKAnet data warehouse was to provide fastest possible query response time. Update and insert efficiency wasn't that important. As far as know it is a fact that the fastest way to provide calculated values to user is to calculate them beforehand and at the query time just fetch the pre-calculated results. We stick to this rule and ended up storing all the pre-calculated indicator values on different dimension levels in a single fact table. So requesting one indicator value from SOTKAnet is technically just fetching one row from the fact table and requesting 500 indicator values from SOTKAnet is to fetch 500 rows from the fact table. ​
Line 52: Line 52:
 At query time it works just like any other star: only the dimension tables and the fact table are used at query time so there is now snow flaking. But before the actual indicator data fetching query when the user browses the dimensions in order to make the query, then there are additional tables to help the dimension browsing. ​ At query time it works just like any other star: only the dimension tables and the fact table are used at query time so there is now snow flaking. But before the actual indicator data fetching query when the user browses the dimensions in order to make the query, then there are additional tables to help the dimension browsing. ​
    
-{{ image:​sotkanet:​sotkanet_dimension.gif?​240}}+{{ images:​sotkanet:​sotkanet_dimension.gif?​240}}
  
 If we take a closer look at area dimension that is described in figure we can see that area dimension consist of three tables. The one connected to fact table is the actual dimension table. It has one row for every municipality,​ sub-region, region, hospital district, province etc in Finland. The hierarchy between the different levels of area dimension (like what municipalities lay inside a given hospital district) is not stored in the dimension table. These hierarchies are stored in a separate hierarchy table, in this case "area hierarchy"​. There is also a third table that holds the possible levels (also know as "​categories"​) of a dimension. Please note that some dimensions have a little bit different structure and some dimension tables like sex also have a row to mark the situation that the value of the dimension is unspecified. In area dimension there is no such a row since there is no indicator data where area could be unspecified. ​ If we take a closer look at area dimension that is described in figure we can see that area dimension consist of three tables. The one connected to fact table is the actual dimension table. It has one row for every municipality,​ sub-region, region, hospital district, province etc in Finland. The hierarchy between the different levels of area dimension (like what municipalities lay inside a given hospital district) is not stored in the dimension table. These hierarchies are stored in a separate hierarchy table, in this case "area hierarchy"​. There is also a third table that holds the possible levels (also know as "​categories"​) of a dimension. Please note that some dimensions have a little bit different structure and some dimension tables like sex also have a row to mark the situation that the value of the dimension is unspecified. In area dimension there is no such a row since there is no indicator data where area could be unspecified. ​
Line 72: Line 72:
 The indicator computation subsystem is one of my innovations that I'm very proud of. It provides very fast data flow, accurate results, easy management and fast error tracing. An overview of it can bee seen in the figure. ​ The indicator computation subsystem is one of my innovations that I'm very proud of. It provides very fast data flow, accurate results, easy management and fast error tracing. An overview of it can bee seen in the figure. ​
    
-{{ image:​sotkanet:​sotkanet_computation.gif?​240}}+{{ images:​sotkanet:​sotkanet_computation.gif?​240}}
  
 It is actually a set of database objects. It consists of huge load of PL/SQL code to control and to manage process, some support tables to store state and statistical information and a large number of interconnected materialized views (also known as "​nested materialized views"​). Materialized vies are a feature of Oracle database. Nested materialized views are the actual objects that calculate an indicator value. We call a set of nested materialized views that calculate an indicator "a chain"​. A chain consists of a number of materialized views that we call "the rings"​. The rings are organized into levels. Level 1 is the nearest level to the data sources and the data flows from level 1 to level 2, level 2 to level 3 etc. A ring in a level is allowed to use only the rings on levels before its own level as sources. I have managed built a rather complex PL/SQL package to handle the refresh flow, to transfer the data from the last level of rights to the fact table, to control the error situations and to calculate and store metadata and control information of the calculation process. The implementation of such a PL/SQL hassle was a hard task but now on its third generation it runs smoothly. ​ It is actually a set of database objects. It consists of huge load of PL/SQL code to control and to manage process, some support tables to store state and statistical information and a large number of interconnected materialized views (also known as "​nested materialized views"​). Materialized vies are a feature of Oracle database. Nested materialized views are the actual objects that calculate an indicator value. We call a set of nested materialized views that calculate an indicator "a chain"​. A chain consists of a number of materialized views that we call "the rings"​. The rings are organized into levels. Level 1 is the nearest level to the data sources and the data flows from level 1 to level 2, level 2 to level 3 etc. A ring in a level is allowed to use only the rings on levels before its own level as sources. I have managed built a rather complex PL/SQL package to handle the refresh flow, to transfer the data from the last level of rights to the fact table, to control the error situations and to calculate and store metadata and control information of the calculation process. The implementation of such a PL/SQL hassle was a hard task but now on its third generation it runs smoothly. ​
Line 94: Line 94:
 SOTKAnet web interface available at http://​www.sotkanet.fi is the channel from the indicator bank to the web. I never stop wondering how simple it makes everything to look like. Just go there and fetch the data you like. You don't have to know anything about the tremendous efforts and the innovations required to build the data warehouse under it. The web interface succeeds in hiding all the complexity. The web interface is built on Oracle Application Server and Oracle Portal and it is designed and implemented by Oracle Finland. My role was only to contribute the process. Most of the presentation components seen on the interface are portlets glued together by Oracle Portal. The implementation languages of the portlets were Java (J2EE) and PL/SQL. The web interface also utilizes Oracle Map Viewer which is a part of Oracle Application Server. SOTKAnet web interface available at http://​www.sotkanet.fi is the channel from the indicator bank to the web. I never stop wondering how simple it makes everything to look like. Just go there and fetch the data you like. You don't have to know anything about the tremendous efforts and the innovations required to build the data warehouse under it. The web interface succeeds in hiding all the complexity. The web interface is built on Oracle Application Server and Oracle Portal and it is designed and implemented by Oracle Finland. My role was only to contribute the process. Most of the presentation components seen on the interface are portlets glued together by Oracle Portal. The implementation languages of the portlets were Java (J2EE) and PL/SQL. The web interface also utilizes Oracle Map Viewer which is a part of Oracle Application Server.
  
-{{image:​sotkanet:​sotkanet_front.gif?​250}} {{image:​sotkanet:​sotkanet_search.gif?​250}} {{image:​sotkanet:​sotkanet_result.gif?​250}} {{image:​sotkanet:​sotkanet_lines.gif?​250}}  +{{images:​sotkanet:​sotkanet_front.gif?​250}} {{images:​sotkanet:​sotkanet_search.gif?​250}} {{images:​sotkanet:​sotkanet_result.gif?​250}} {{images:​sotkanet:​sotkanet_lines.gif?​250}}  
-{{image:​sotkanet:​sotkanet_bars.gif?​250}} {{image:​sotkanet:​sotkanet_map.gif?​250}} ​+{{images:​sotkanet:​sotkanet_bars.gif?​250}} {{images:​sotkanet:​sotkanet_map.gif?​250}} ​
            
 ===== The bottom line ===== ===== The bottom line =====