The Value of Open-Source

Leveraging free, code-first tools to iterate toward advanced analytics

Alex Zajichek

Research Data Scientist, Cleveland Clinic

February 27, 2025

What is AI, Anyway?

Two Sides of A Coin


AI as Tools

  • Pre-built products like ChatGPT, Gemini, etc.
  • Use (purchase) to conform to our tasks
  • Perceived as productivity tools
  • Dominates the conversation

AI as Data Science

  • Data infrastructure, analytical thinking, statistical reasoning, tools for facilitation
  • How we use data to help inform decision making
  • The building blocks of AI itself

Blurred Lines



  • Tend to focus on the first, bypassing the second; conflating views


  • Leads to ambiguity and confusion (figure)

Figure: The confusing nature of AI-related fields

Figure: The confusing nature of AI-related fields

The Horse Before the Cart

Current State

  • Gartner (2018): 87% of business at low analytics maturity [1]

Figure: Gartner Analytics Maturity Model

Figure: Gartner Analytics Maturity Model

Back to Basics

  • Are you capturing what’s important? (Data infrastructure)
    • “If I knew X at time Y, I could do Z”
  • How are you using your data? (Reporting/analytical thinking)
  • What are your limitations? (Tools, skills, time, etc.)
  • Are AI tools really the solution?
    • “AI for the sake of AI is a losing proposition” [2]. Be intentional!

The Case for Open-Source

What Is Open-Source?

Background

  • In essence, free and open software (Wikipedia)
  • Think of as community built
  • Like a workshop for your raw materials (with directions)

Benefits in Data Science

  • Code-first approach gives flexibility and control
    • Art + science
  • Use it to facilitate analytical approach
    • Building blocks to AI
  • Low-cost iteration and experimentation (anyone can do it)


The R Programming Language

Background

  • A free and open-source functional programming language
  • Developed for statistical computing, but has long since expanded to much broader usage
  • Advanced through packages developed by the community (Package list)
  • Commonly used in the RStudio IDE

Example

library(plotly) # Load package
plot_ly(
  data = trees, # Access dataset
  x = ~Height,
  y = ~Girth,
  size = ~Volume,
  color = ~Volume,
  text = ~paste0("Height: ", Height, "<br>Girth: ", Girth, "<br>Volume: ", Volume), height = 300, width = 500
)

Map example

# Load packages
library(tidyverse)
library(tidycensus)
library(mapgl)

# Extract a data set to use
dat <- 
  get_acs(
    geography = "tract",
    variables = "B19013_001", # Median income,
    state = "WI",
    year = 2022,
    geometry = TRUE
  ) |>
  
  # Make an information column
  mutate(
    Info = paste0(str_remove(NAME, ";.+$"), "<br>Median Income ($): ", round(estimate))
  )

maplibre() |>
  
  # Focus the mapping area
  fit_bounds(dat) |>
  
  # Fill with the data values
  add_fill_layer(
    id = "mc_acs",
    source = dat,
    fill_outline_color = "black",
    fill_color = 
      interpolate(
        column = "estimate",
        values = range(dat$estimate, na.rm = TRUE),
        stops = c("#f2d37c", "#08519c"),
        na_color = "gray"
      ),
    fill_opacity = 0.50,
    popup = "Info"
  ) |>
  add_legend(
    legend_title = "Median income ($)",
    values = range(dat$estimate, na.rm = TRUE),
    colors = c("#f2d37c", "#08519c")
  )

Quarto for Reproducible Documents

Background

  • Quarto is an open source technical publishing system
  • Build custom analytical documents in programmatic way
  • Vehicle for dissemination, promoting automation and reproducibility
  • Integrates well with R, Python, and many other tools
    • This presentation (and website that it’s contained in) are built in Quarto

YAML (header/metadata)

---
title: "The Value of Open-Source"
subtitle: "Leveraging free, code-first tools to iterate toward advanced analytics"
author: "Alex Zajichek"
institute: "Research Data Scientist, Cleveland Clinic"
date: "2025-02-27"
date-format: long
format:
  revealjs: 
    theme: [serif, custom.scss]
    footer: "<em>AI Innovations at Work Conference 2025</em>"
    slide-number: true
    incremental: true
---

Markdown (body)

### Background

- [Quarto](https://quarto.org/) is a open source technical publishing system
- Build documents, reports, websites, presentations, dashboards, books, etc.
- Promotes automation and reproducibility

Shiny for Web Applications

Background

  • Shiny is an R package for building custom, interactive web applications
  • Allows users to interact with data, analytics, models, etc. however you see fit

Example (in R)

library(shiny) # Load package

# The user interface
ui <- 
  fluidPage(
    title = "MyApp", 
    sidebarLayout(
      sidebarPanel(
        selectInput(
          inputId = "color",
          label = "Choose Color",
          choices = c("red", "blue", "green")
        )
      ),
      mainPanel(
        plotOutput("my_plot")
      )
    )
  )

# How the inputs turn to outputs
server <- 
  function(input, output) {
    
    output$my_plot <-
      renderPlot({
        with(trees, plot(Height, Girth, col = input$color))
      })
    
  }

# Run the app
shinyApp(ui, server)

Deployment and integration

  • Last piece of the puzzle: sharing the work
  • Start with free; Iterate and pay as needed
    • Not all or nothing!
  • Repeat: use to facilitate analytical strategy

Potential options

How can I learn?

  • Combining
  • Quantfish
  • YouTube
  • Tutorials

ChatGPT

Shiny Assistant

https://gallery.shinyapps.io/assistant/

Tools in Action: riskcalc.org

What is riskcalc.org?

Background

  • Repository of risk calculators for individualized medical decision making
  • Embedded with published predictive models for various clinical outcomes
  • Majority are regression models



Low-Cost Backend

  • Each calculator is a Shiny app

  • Diagram showing path

  • Shiny Server: Can install on premises, have a functioning shiny server behind your firewall, for free

  • All of these things are free

  • Use AWS (we pay, but you can start for free)

Acknowledgements

  • QHS
  • AM, KJ, BMD
  • Mike Kattan

Conclusion

  • Ai tools can be useful…
  • Open source best way to get started and facilitate because of low-cost and flexibility

References

  1. https://www.gartner.com/en/newsroom/press-releases/2018-12-06-gartner-data-shows-87-percent-of-organizations-have-low-bi-and-analytics-maturity
  2. https://www.kearney.com/service/digital-analytics/ai-assessment-aia-2024-the-drive-for-greater-maturity-scale-and-impact