Your Mission
Use various publicly available datasets to generate an insightful and innovative analysis into diabetes or obesity in Denmark, Europe or the world. You select the questions you ask, as well as determine your analytical and visualization approach. We have proposed some questions; however, you are welcome to propose your own question and integrate your own dataset(s) to use. Alongside, the newly released IBM Data Science Experience (DSX) platform will be available to you, or you can elect to use your own preferred tool.
Possible Data Sources to Explore:
You can come up with your own question and/or datasets, or consider using one of the ideas below. We are looking for an advanced data science and/or visualisation approach and judging for prizes will take that into consideration.
Idea 1:
Create a visualization of physician relationships/referral patterns, and develop a methodology for quantifying the likely influence of individuals within physician networks. Possible dataset to use: CMS Physician Referrals data available at https://www.docgraph.com/referral-data/ (also on CMS site, but we recommend the data on docgraph)
If time permits, overlap pharma spend patters on physicians to see how aligned promotional investment is to the most “networked” prescribers.
Dataset for that: https://openpaymentsdata.cms.gov/
Remember to focus on physicians that treat diabetes.
You may find the information on Docgraph’s Google group useful: https://groups.google.com/forum/#!forum/docgraph
Idea 2:
Analyze the relationship between obesity and food availability: How does commodity availability and consumption in the United States (fruits, vegetables, meat, grains, etc.) relate to obesity?
Possible Datasets: Obesity Data from NY, USDA ERS Commodity Consumption Data
Present your results as either a creative visualisation or a model/statistical analysis.
Idea 3:
Explore diabetes (ICD10 codes as described here: http://www.icd10data.com/ICD10CM/Codes/E00-E89/E08-E13/E11-) as a cause of death in the US or other geography: What factors (other conditions or patient demographics) are highly correlated with diabetes as a cause of death? Has this changed overtime?
Possible datasets available at: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm#Mortality_Multiple (raw data) or at https://www.kaggle.com/cdc/mortality (in a relational format).
Idea 4:
Check out the Danish Longitudinal Study on Ageing. There are many possible datathon projects in that data! Here's an idea: What inferences can you make about the effect of social networks & relationships, and psychological well-being on diabetes?
Create a visualization of how that may or may not evolve over time. Or identify correlations among Diabetes and other qualitative factors. Dataset can be downloaded here
Idea 5:
Evaluate the relationship between health (specifically diabetes and obesity diagnoses) and nutritional status in the US using The National Health and Nutrition Examination Survey (NHANES)
Dataset is available here: http://www.cdc.gov/nchs/data_access/ftp_data.htm
Additional questions for your consideration (which may require datasets not in above list) are:
Possible Data Sources to Explore:
- World Health Organization Health Data & Statistics: http://www.who.int/healthinfo/statistics/en/
- Danish eHealth Data: http://esundhed.dk/Sider/Forside.aspx
- Medstat – Danish site on drug utilization: http://medstat.dk/en
- Data.gov Catalog (US): http://catalog.data.gov/dataset?q=diabetes&sort=score+desc%2C+name+asc
- The Danish Longitudinal Study on Ageing: Click here (long link)
- US Centers for Disease Control and Prevention (CDC) Diabetes Home Page: http://www.cdc.gov/diabetes/home/index.html
- CDC National Health & Nutrition Examination Survey (NHANES): http://www.cdc.gov/nchs/data_access/ftp_data.htm
- IDF Diabetes Atlas: http://www.diabetesatlas.org/resources/2015-atlas.html
- Healthdata.gov (US): http://www.healthdata.gov/browse?query=diabetes
- Open Data Network: http://www.opendatanetwork.com/search?q=diabetes
- UC Irvine Machine Learning Repository (#1): https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008
- UC Irvine Machine Learning Repository (#2): https://archive.ics.uci.edu/ml/datasets/Diabetes
- Data.Medicare.gov (US): https://data.medicare.gov/
- UC Irvine Machine Learning Repository (#3): https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes
- Data.CMS.gov (US): https://data.cms.gov/
- CMS Physician Referral Data (available on Docgraph site or on the CMS site): https://www.docgraph.com/referral-data/
- Open Humans (site provides an API to extract publically available health and fitness data): https://www.openhumans.org/public-data-api/ (new)
You can come up with your own question and/or datasets, or consider using one of the ideas below. We are looking for an advanced data science and/or visualisation approach and judging for prizes will take that into consideration.
Idea 1:
Create a visualization of physician relationships/referral patterns, and develop a methodology for quantifying the likely influence of individuals within physician networks. Possible dataset to use: CMS Physician Referrals data available at https://www.docgraph.com/referral-data/ (also on CMS site, but we recommend the data on docgraph)
If time permits, overlap pharma spend patters on physicians to see how aligned promotional investment is to the most “networked” prescribers.
Dataset for that: https://openpaymentsdata.cms.gov/
Remember to focus on physicians that treat diabetes.
You may find the information on Docgraph’s Google group useful: https://groups.google.com/forum/#!forum/docgraph
Idea 2:
Analyze the relationship between obesity and food availability: How does commodity availability and consumption in the United States (fruits, vegetables, meat, grains, etc.) relate to obesity?
Possible Datasets: Obesity Data from NY, USDA ERS Commodity Consumption Data
Present your results as either a creative visualisation or a model/statistical analysis.
Idea 3:
Explore diabetes (ICD10 codes as described here: http://www.icd10data.com/ICD10CM/Codes/E00-E89/E08-E13/E11-) as a cause of death in the US or other geography: What factors (other conditions or patient demographics) are highly correlated with diabetes as a cause of death? Has this changed overtime?
Possible datasets available at: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm#Mortality_Multiple (raw data) or at https://www.kaggle.com/cdc/mortality (in a relational format).
Idea 4:
Check out the Danish Longitudinal Study on Ageing. There are many possible datathon projects in that data! Here's an idea: What inferences can you make about the effect of social networks & relationships, and psychological well-being on diabetes?
Create a visualization of how that may or may not evolve over time. Or identify correlations among Diabetes and other qualitative factors. Dataset can be downloaded here
Idea 5:
Evaluate the relationship between health (specifically diabetes and obesity diagnoses) and nutritional status in the US using The National Health and Nutrition Examination Survey (NHANES)
Dataset is available here: http://www.cdc.gov/nchs/data_access/ftp_data.htm
Additional questions for your consideration (which may require datasets not in above list) are:
- Can urban livability or other factors (e.g. access to public transportation, green spaces, crime rate) predict the diabetes rates in cities? (This question could also address obesity.)
- Can social media data be used to predict/monitor attitudes toward obesity (or rates of obesity and/or diabetes) in different areas of the world (or a specific region or country)?
- What is the best way to visualize the impact of diabetes in Denmark, the EU, the US, or another geographic area?
- Can you predict who will be diagnosed with diabetes in the next year?
- Can you identify other health and lifestyle implications of diabetes by examining regional demographic and disease datasets?
- In which new ways could patient- and health system-generated data be used for the benefit of patients with diabetes?
- Does increasing use of Fitness Tracking devices (i.e., FitBits) correlate with lower rates of obesity (and therefore diabetes)?
- Are there networks of patient or prescriber influence that can be leverage to encourage better prevention, diagnosis and treatment of diabetes? Consider social media or some of the data available at cms.gov (referrals)
Why diabetes?
Diabetes is a global pandemic. The International Diabetes Federation (IDF) estimates that 415 million people have diabetes today – a figure that the International Diabetes Foundation (IDF) estimates could rise to 642 million by 2040. Complications from poorly managed diabetes include cardiovascular disease, kidney disease, nerve damage, and blindness. Diabetes and its complications account for 12% of global health expenditure ($673 billion US dollars). In the United States alone, over 9% of the population has diabetes, of which approximately 25% are undiagnosed. Obesity (defined as having body mass index >30) is the single biggest predictor of diabetes. Almost 90% of people living with type 2 diabetes are overweight or have obesity.
Diabetes is a group of disorders characterized by chronic high blood glucose levels (hyperglycemia) due to the body’s failure to produce enough insulin to regulate high glucose levels. There are two main types of diabetes:
Type 2 diabetes is more prevalent than Type 1 diabetes, occurring in approximately 90% of all diabetes cases. It is predominantly diagnosed after the age of forty; however, it is now being diagnosed in all age ranges, including children and adolescents. Diabetes is one of the major causes of mortality in the U.S. and the world.
There are many risk factors for Type 2 diabetes, including age, race, pregnancy, stress, certain medications, genetics, and high cholesterol. However, the single best predictor of type 2 diabetes is being overweight or obese. People who are overweight or obese have added pressure on their body’s ability to use insulin to properly control blood sugar levels, and are therefore more likely to develop diabetes. The rapid increase in diabetes occurrence in the United States is mostly attributed to the growing prevalence of obesity.
You can read more about Diabetes here: https://www.idf.org/sites/default/files/EN_6E_Atlas_Full_0.pdf
Diabetes is a group of disorders characterized by chronic high blood glucose levels (hyperglycemia) due to the body’s failure to produce enough insulin to regulate high glucose levels. There are two main types of diabetes:
- Type 1 diabetes, which often occurs in children or adolescents, is caused by the body’s inability to make insulin, and
- Type 2 diabetes, which occurs as a result of the body’s inability to react properly to insulin (insulin resistance)
Type 2 diabetes is more prevalent than Type 1 diabetes, occurring in approximately 90% of all diabetes cases. It is predominantly diagnosed after the age of forty; however, it is now being diagnosed in all age ranges, including children and adolescents. Diabetes is one of the major causes of mortality in the U.S. and the world.
There are many risk factors for Type 2 diabetes, including age, race, pregnancy, stress, certain medications, genetics, and high cholesterol. However, the single best predictor of type 2 diabetes is being overweight or obese. People who are overweight or obese have added pressure on their body’s ability to use insulin to properly control blood sugar levels, and are therefore more likely to develop diabetes. The rapid increase in diabetes occurrence in the United States is mostly attributed to the growing prevalence of obesity.
You can read more about Diabetes here: https://www.idf.org/sites/default/files/EN_6E_Atlas_Full_0.pdf