Scatter Plot Maker

A scatter plot (or scatter diagram) is a type of plot that displays values for typically two variables for a set of data. Each point on the scatter plot represents an observation, with its position determined by its values for the two variables. Scatter plots are essential tools for visualizing relationships between variables.

What is a Scatter Plot?

A scatter plot uses Cartesian coordinates to display values for two variables for a set of data. The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.

When to Use a Scatter Plot

  • To visualize the relationship between two continuous variables
  • To identify correlations (positive, negative, or none)
  • To detect outliers or unusual patterns in data
  • To assess the strength of a relationship between variables
  • To identify clusters or groups within the data
  • To determine if a linear or non-linear relationship exists

Key Features

  • Regression line: Optional best-fit line with equation and R²
  • Custom point colors: Personalize the appearance of your points
  • Interactive controls: Toggle regression line, adjust point size
  • Correlation info: See correlation coefficient and related statistics
  • Download: Save your scatter plot as a PNG image

Interpreting Scatter Plots

  • Positive correlation: Points trend upward from left to right
  • Negative correlation: Points trend downward from left to right
  • No correlation: Points appear randomly scattered
  • Strong correlation: Points closely follow a clear pattern
  • Weak correlation: Points loosely follow a general trend
  • Outliers: Points that fall far from the general pattern

Regression Line

The regression line (or line of best fit) represents the linear relationship between the two variables. The equation takes the form Y = a + bX, where:

  • a is the y-intercept (where the line crosses the Y-axis)
  • b is the slope (change in Y for each unit change in X)
  • (R-squared) indicates how well the line fits the data (0 to 1)

Best Practices

  • Always label your axes clearly with variable names and units
  • Use an appropriate scale that shows the full range of data
  • Consider using different symbols or colors for different groups
  • Add a trend line when exploring linear relationships
  • Be cautious about inferring causation from correlation
  • Look for outliers that might affect your interpretation