Recent technological advances in high-throughput experimental analysis have had a profound impact on the practices and scope of biomedical research. The volumes of data collected at the genome scale and the requirements for tools to analyze them have largely mobilized the data analysis community.

Translational Bioinformatics is an emerging field that addresses the current challenges of integrating increasingly voluminous amounts of molecular and clinical data. Its aim is to provide a better understanding of the molecular basis of disease, which in turn will inform clinical practice and ultimately improve human health.

Therefore, Translational Bioinformatics can be defined as the “development of storage, analytic, and interpretative methods to optimize the transformation of increasingly voluminous biomedical data – genomic data in particular – into proactive, predictive, preventive, and participatory health management”.

IT Infrastructure for Biomedical Research

In the last years the availability of new technologies for translational research increase the need to have effective informatics solutions to analyze and integrate heterogeneous clinical and molecular Big Data of large patient populations in order to fully understand a disease.

A solution of excellence for data integration for translational research has been developed by the research center i2b2 – Informatics for Integrating Biology and the Bedside (at the University of Harvard, Boston funded by NIH. This center developed i2b2, an open-source software based on a data-warehouse able to integrate and to exploit all data coming from clinical practice and hospital admissions, making them available and easily accessible by researchers. Nowadays i2b2 environments are installed in hundreds of clinical centers to manage data for more than 100 millions of patients. I2b2 represents the basis for the ‘’precision medicine program’’ recently funded by US government and President Obama.

Laboratory of Biomedical Informatics “Mario Stefanelli” (BMI) is an academic partner of the i2b2 project since 2008 and has implemented i2b2 platform to support many translational research projects in all the clinical centers in Pavia.

IRCCS “C. Mondino National Institute of Neurology” Foundation, Pavia, Italy
    Lab. Experimental Neurobiology (2009)
IRCCS Fondazione Policlinico San Matteo, Pavia, Italy
    Dep. Transfusion Medicine (2010)
    Dep. Hematology Oncology (2015)
IRCCS Fondazione Salvatore Maugeri, Pavia, Italy
    Dep. Cardiology rehabilitation (2012) 
    Oncology Unit (2011)
    Hospital Information System (2016)

The University of Pavia has also implemented i2b2 platform for supporting research activities in other Italian hospitals:

IRCCS Centro cardiologico Monzino, Milano, Italy  – Intensive Care Unit (2011)
IRCCS Fondazione Ca Granda Ospedale Maggiore Policlinico, Milano, Italy – HIS (2016)

and for data integration within some European Projects:

FP7-INHERITANCE (La Coruña - Spain) – Cardiomyopathies (2011)
IMI-SUMMIT (Lund, Sweden) – Cardiovascular complications (2013)
FP7-MOSAIC(ASL and FSM Pavia, Valencia, Athens-GR) – Diabetes (2015)

BIOMERIS is an academic Spin-Off of University of Pavia and official partner of i2b2 and Foundation. Since 2012 BIOMERIS has provided professional services to implement the i2b2 environment in EU health institutions.

Temporal Data Mining

The application of data mining techniques to the medical and biological domain has gained great interest in the last few years, also thanks to the encouraging results achieved in many fields. One issue of particular interest in this area is represented by the analysis of temporal data, usually referred to as Temporal Data Mining (TDM). Within TDM, research focuses on the analysis of time series, collected measuring clinical or biological variables at different points in time. The explicit handling of time in the data mining process is extremely attractive, as it gives the possibility of deepening the insight into the temporal behavior of complex processes, and may help to forecast the future evolution of a variable or to extract causal relationships between the variables at hand.

An increasing number of TDM approaches is currently applied to the analysis of biomedical data; in functional genomics, for example, clustering techniques have been largely exploited to analyze gene expression time series, in order to assess the function of unknown genes. TDM has also been successfully used to study gene expression time series of particular cell lines which are crucial for understanding key molecular processes of clinical interest, such as the insulin actions in muscles and the cell cycle in normal and tumor cells. Several works have been proposed also for what concerns the representation and processing of time series coming from the monitoring of clinical parameters, collected for example during an ICU staying.

Temporal Rules

One of the most attractive applications of AI-based TDM concerns the extraction of temporal rules from data. Unlike association rules, temporal rules are characterized by the fact that the consequent is related to the antecedent of the rule by some kind of temporal relationship; moreover, a temporal rule typically suggests a cause-effect association between the antecedent and the consequent of the rule itself. When applied to the biomedical domain, this could be of particular interest, for example in reconstructing gene regulatory networks or in discovering knowledge about the causes of a target event.

Learning temporal rules with complex patterns in biomedical time series

We developed two algorithms for the mining of temporal rules from data. In the first we defined a method for the discovery of both association and temporal rules to get an insight into the possible causes of non-adherence to therapeutic protocols in hemodialysis, through the analysis of a set of monitoring variables. Time series are first summarized through qualitative patterns extracted with the technique of knowledge-based Temporal Abstractions (TAs); then, possible associations between those patterns and the non-adherence events are mined with an APRIORI-like procedure. The method only treats rules with antecedents composed by the conjunction of simple patterns (i.e. patterns of the kind “increasing”, “decreasing”, …), where the conjunction is interpreted as a co-occurrence relationship (i.e. “variable A increasing” occurs at the same time of “variable B decreasing”). If this conjunction temporally precedes another simple pattern, say “variable C increasing”, sufficiently often, a rule of the kind “variable A increasing and variable B decreasing precedes variable C increasing” is generated.
In the second algorithm, an extension of the first method is proposed, aimed at extracting rules with arbitrarily complex patterns as members of both the rule antecedents and consequents. Such patterns can be defined in advance by the user (typically relying on prior knowledge on the problem domain), or they might be automatically generated by a complex pattern extractor. This extension is able to deal with the search of relationships between complex behaviors, which can be particularly interesting in biomedical applications. For example, a drug is first absorbed and then utilized, so that its plasma distribution precedes its effect in the target tissue. In this case, it would be important to look for complex episodes of “up and down” type in the drug plasma concentration, to automatically extract temporal knowledge in the data. The method enables the user to define episodes of interest, thus synthesizing the domain knowledge about a specific process, and to efficiently look for the specific temporal interactions between such complex episodes.

Healthcare Risk Analysis

Information technology for quality assessment of Hemodialysis Services

Several clinical studies have shown that the “Failure to Adhere” to the hemodialysis treatment plan is related to a worsening of the long-term outcomes in dialysis patients. We are currently working on a software tool called EMOSTAT, which is able to automatically collect the hemodialisys data, to compare them with the prescriptions and to highlight the reasons of “the failures to adhere” by statistical analysis. The software design follows the principles of data mining tools, thus enabling the user to “drill down” the data, starting from the dialysis-centre view and moving to the single patient analysis.

During the patients’ monitoring period the software allowed to highlight vascular problems, and to intervene on the patients’ vascular access. Moreover, it enabled the dialysis centre to solve the non-adherence problems related to the Bulk blood flow. Finally, it clearly identified the management of the Body weight as the main failure problem of the dialysis centre. EMOSTAT demonstrated to be a reliable and efficient tool for the automated auditing of hemodialysis sessions.

EMOSTAT: Implementation of an automated system for monitoring adherence to hemodialysis treatment: a report of seven years of experience.


The term “medical informatics” refers to the discipline that studies how to represent, analyze and communicate biomedical data and knowledge, both within a single health care organization and between different organizations, often including the patient’s home. Moreover, it deals with the information science and the technology to support these tasks.

The aim of medical informatics in our laboratory is the design and development of systems aimed at supporting patients and medical staffs in the various stages of the clinical path. Examples of physician-addressed systems are: computerized clinical practice guidelines, generating recommendations according to the best scientific evidence; careflow management systems, allowing the different healthcare operators sharing data and knowledge efficiently; process mining systems, allowing to capture healthcare professionals’ behaviour starting from system logs; decision trees, supporting the physician in taking decisions according to utility theory. The latter are also used for the so-called “shared decisions” where patients and physicians together reason about unclear situations before taking a final decision about a treatment.  Examples of patient-addressed systems are: telemedicine and tele-homecare systems, often based on body area networks of sensors, able to provide a safer patient’s monitoring and a light decision support for those (non-critical) situations in which the patient himself is allowed to change his therapy (e.g.  adjusting insulin doses according to daily glycemia measurements); software tools for home cognitive rehabilitation; questionnaire administration methods for measuring the patients’ quality of life, as a means for measuring a treatment effect.

Eventually, since health care institutions are more and more paying attention to cost containment, our lab developed also expertise on economic evaluations of healthcare programmes (cost-effectiveness and cost-utility studies), offering decision support for policy makers.

Although aimed at the development of applications, research play a vital role in our lab. The main research areas are knowledge representation, understanding and elicitation of cognitive processes,  ontologies for representing medical domains entities and their relationships, and decision theory.

Decision Support

Decision support systems (DSSs) have been historically classified into two categories: those based on probabilistic formalisms and those based on artificial intelligence (AI) formalisms. In our lab, we override this dichotomy by building systems that use both types of formalisms, according to the specific decision task. To make a concrete example, consider a clinical practice guideline that is made by a set of recommendations. A recommendation can in general be represented as a production rule, a very well-known AI formalism. But some of these recommendations may also point to the need of sharing the decision with patients and their families, or to take into account other values than health outcomes, e.g. costs. In these cases, a probabilistic formalism such as an influence diagram or a decision tree is appropriate. These probabilistic models may also embeds Markov models, that describe the transitions of a patient, or of a patients’ cohort, among a set of possible health states, starting from an initial state. This allow, for example, to estimate life expectancy, quality-adjusted life years, costs and other values that are considered crucial for the decision at hand.  Given the model results, patients, their relatives and healthcare professionals may reason and take a more informed decision.

Importantly, the decision may concern a specific patient, or a population. In the first case, questions must be done to the patient in order to elicit his own preferences, and this is per-se another research area, since different methods exist and must be patient-tailored: some methods are based on direct questions, some other are based on questionnaires for self-administration. Together with our medical partners, we make experimental research on this topic.

Another important topic is the analysis of the compliance of physicians and patients to the suggestions provided by a DSS. Of course the DSS can suggest something, but the final decision is up to the human subject. Thus, it is very interesting to measure the compliance, in order to individuate weaknesses of the DSS,  and take the opportune corrective actions, or to detect incorrect human behaviours,  that need educational interventions.

As a methodological approach to DSS, ontologies are used to represent the entities of the decision problem domain and their relationships.

Medical  Areas:


Stroke- Guidelines for stroke management (prevention, treatment and rehabilitation) have been represented, implemented, and results analysed. Non compliance have been studied and correlated to health and cost outcomes.


Atrial Fibrillation- Guidelines for atrial fibrillation are currently studied, taking into particular  account the set of  “parallel guidelines” that drug prescriptions generate for the patient at home, for guiding him to the most correct drug administration. Decision trees are linked to the guideline, e.g. for the choice of undergoing or not to oral anticoagulant treatment. 

Rare Diseases

Amyloidosis- Guidelines for diagnosis and protocols for treatment are represented, with the final goal of linking them to the electronic medical record.

Tools: NEWGUIDE[1], Protégé[2], TreeAge[3]

Please find further information searching in our publication repository or by contacting the Responsible of each area.

People working on this topic:

Supervisors: Silvana Quaglini

Collaborators:Enea Parimbelli, Paola Russo, Lucia Sacchi, Carla Rognoni, Stefania Rubrichi, Silvia Panzarasa, Paolo Tormene

External Collaborations:

Prof. Mor Peleg, Haifa University, Israel

Prof. Yuval Shahar, Ben Gurion University, Beer Sheva, Israel.

Prof. Werner Ceusters, University at Buffalo, Buffalo, NY, USA

Giuseppe Micieli, Anna Cavallini, IRCCS C. Mondino, Pavia, Italy.

Giampaolo Merlini, Giovanni Palladini, IRCCS Policlinico San Matteo, Pavia, Italy

Silvia Priori, Carlo Napolitano, Andrea Mazzanti,IRCCS S. Maugeri, Pavia, Italy

Most of the work on this topic is currently carried out within the EU project Mobiguide.

[1] P Ciccarese, E Caffi, S Quaglini, M Stefanelli. Architectures and tools for innovative health information systems: the Guide Project. International journal of medical informatics 74 (7-8), 553-562



Mobile Health

Telemonitoring efforts have been spent since some time for managing diseases and administering treatments in outpatients with the aim of enforcing a tighter control on the therapy. Those systems most often address the management of chronic diseases on the basis that they help in delaying the onset of their complications and reduce hospitalization episodes thus saving any related costs. Until recently remote monitoring for outpatients usually entailed sending a limited amount of information daily, or even less frequently, through a fixed station such as a PC, using landline connections and according to the patient convenience.

Literature has shown that the effectiveness of telemonitoring greatly increases with systems always available to the patient, which resulted in a poor exploitation of those systems due to the opposite paradigm they privileged.

Today’s quality of network connections, combined with the current performance of smartphones, allow instead the design of a new class of systems which seems to provide a viable and promising solution. Besides the improved comfort for the patient due to the mobility, the possibility of interfacing smartphones with external devices opens up the possibility of an ubiquitous and automatic transmission of signals in real time which may help also for monitoring clinical trials.

The laboratory of Biomedical Informatics is particularly active in the telemonitoring area involving smartphones and tablets both as a provider of the enabling infrastructure and as a system integrator delivering the custom solutions. To this aim we have developed a modular two-way synchronization infrastructure which is particularly valuable for mobile devices since it is able to buffer data in poor network coverage conditions, thus preventing any loss of data [1].

The infrastructure has been used as a basis for enabling the remote monitoring of an Artificial Pancreas implemented on a wearable device using the Android Operating System [2]. Besides guaranteeing patient safety this system proved to be particularly useful for collecting data into a central database in order to delegate the extensive computations needed for tuning and personalizing the insulin controlling algorithms to a server.

Synchronization can also be exploited for implementing generic Points of Care, such as the one we designed to support patients affected by nephropathy. For those patients we developed a system directly linked with a scale and a blood pressure monitor which allowed them to send daily measurements to the clinic [3]. A present efforts sees instead the implementation of a Point of Care supporting remote cardiotocographical consultations for pregnant women. In that case a smartphone has been interfaced with a device for capturing cardiac signals and allows women at risk to autonomously acquire data and send them to their treating gynecologists for getting frequent consultations.

Finally, in order to complement the transmission of plain signals with contextual information, we also implemented a system running on tablets and useful to download custom questionnaires on demand from a site. That system allows patients to fill in questionnaires as many times as needed and synchronize their answers with a remote site for a prompt perusal by the clinicians [4].

The above mentioned examples only represent some possible exploitations of the mobile networked technology applied to health care, since the number of applications is virtually countless.

Medical Areas:

Artificial Pancreas, Diabetes and Chronic Diseases Treatment,

Cardiotochography, Clinical Investigations.

People working on this topic:

Ignazio Secci, Stefania Scarpellini, Germana Ginardi

Supervisors: Giordano Lanzola


Prof. Giovanni Magenes, Laboratory of Bioengineering, University of Pavia, Pavia, Italy.

Prof. Maria Gabriella Signorini, Politecnico di Milano, Milano, Italy.

Prof. Lalo Magni, Dept. of Civil Engineering and Architecture,

University of Pavia, Pavia, Italy.


Funny [1]; Chronic Disease Monitoring [2]; AP Monitoring Trial [3]; GQuest [4]



D. Capozzi, G. Lanzola.

A data synchronization framework for personal health systems.

In: Proceedings of the MobiHealth 2011 Conference, K.S. Nikita et al.(Eds.),

LNICST, Vol. 83, pp. 300-304; ISBN: 978-3-642-29733-5.


D. Capozzi, G. Lanzola.

A Generic Telemedicine Infrastructure for Monitoring an Artificial

Pancreas Trial.

Computer Methods and Programs in Biomedicine, 2013 (in press).


D. Capozzi, G. Lanzola.

An Agent-Based Architecture for Home Care Monitoring and Education of

Chronic Patients.

In: Proceedings of the 2010 IEEE Conference on Complexity in

Engineering, COMPENG’10; pp 138-140.


G. Ginardi, G. Lanzola.

A Mobile Platform for Administering Questionnaires and Synchronizing their Answers.

In: Proceedings of the Mobile Learning 2013 Conference.

Healthcare Processes: Careflow Modeling and Process Mining

Healthcare institutions are increasingly facing pressure to reduce costs, while at the same time improving the quality of care. In order to reach such a goal, healthcare administrators and expert physicians need to evaluate the services the institution provides. Service evaluation requires to analyze medical processes, which are often automated and logged by means of the workow technology. 

Process analysis (PA) covers functions of simulation and diagnosis of processes. While simulation can support performance issues evaluation, diagnosis can highlight e.g., similarities, differences, and adaptation/redesign needs. Indeed, the existence of different patients categories, or of local resource  constraints, can make differences between process instances necessary, and process adaptation compulsory (even when the medical process implements a well accepted clinical guideline). Proper PA techniques are strongly needed when a given process model does not exist, e.g., because a full clinical guideline has not been provided, and only some recommendations are implemented. In this case, process mining techniques can be exploited, to extract process related information (e.g., process models) from log data. It is worth noting, however, that the mined process can also be compared to the existing guideline (if any), e.g., to check conformance, or to understand the required level of adaptation to local constraints. Thus, the mined process information can always be used to understand, adapt and redesign processes to become efficient high quality processes. Once the optimal process has been devised, it can be implemented through workflow management systems, that in this case we called “careflow systems”.

Medical  Areas:


Stroke- Data from the SUN Lombardia registry are used for this research area. These data concern the diagnostic and therapeutic patterns of patients with transient ischemic attacks, hemorrhagic stroke and ischemic stroke. Timestamps are associated to actions performed so that processes may be learned. More than 20,000 records are available for our analyses

Tools: ProM[1]

Please find further information searching in our publication repository or by contacting the Responsible of each area.

The activities in this area are carried out within projects funded by the Healthcare Ministry and the Regione Lombardia Healthcare Division.

People working on this topic:

Supervisor: Silvana Quaglini

Collaborators: Silvia Panzarasa

External Collaborations:

Giuseppe Micieli, Anna Cavallini, IRCCS C. Mondino, Pavia, Italy.

Giampaolo Merlini, Giovanni Palladini, IRCCS Policlinico San Matteo, Pavia, Italy

Stefania Montani, Universita del Piemonte Orientale, Italy