Counting values with adviz

Counting values

Count the unique values in a collection of items

  • Count in absolute and proportion (percentage) terms
  • Show the cumulative count and percent
  • Determine the number of top items to show
  • Show all remaining items lumped into one item named Others:
  • Change the name of the items, as well as the caption and various other styling options

Installation and usage

Installation:

python3 -m pip install adviz

Usage:

import adviz
adv.value_counts_plus([item_1, item_2, ... item_n], ...)

Usage

Generate a random list of 10,000 colors

Code
import random
import numpy as np
import matplotlib as mpl
import pandas as pd
import plotly.express as px

colors = list(mpl.colors.cnames.keys())
colors = random.choices(colors, weights=[0.9, 0.04, 0.05, 0.09]*37, k=10_000)
colors += [np.nan for i in range(240)]
colors[:20]
['mediumslateblue',
 'purple',
 'darkred',
 'lightgreen',
 'coral',
 'red',
 'forestgreen',
 'fuchsia',
 'olive',
 'deeppink',
 'navajowhite',
 'darkmagenta',
 'olivedrab',
 'plum',
 'salmon',
 'cyan',
 'cyan',
 'olivedrab',
 'lightgreen',
 'indigo']

View default output

import adviz
adviz.value_counts_plus(colors)

Counts of data

  data count cum. count % cum. %
1 lightsteelblue 255 255 2.5% 2.5%
2 lavenderblush 254 509 2.5% 5.0%
3 mediumslateblue 249 758 2.4% 7.4%
4 indigo 246 1,004 2.4% 9.8%
5 lightgreen 243 1,247 2.4% 12.2%
6 nan 240 1,487 2.3% 14.5%
7 plum 237 1,724 2.3% 16.8%
8 burlywood 237 1,961 2.3% 19.2%
9 lightseagreen 237 2,198 2.3% 21.5%
10 deeppink 236 2,434 2.3% 23.8%
11 Others: 7,806 10,240 76.2% 100.0%

Change the number of top values: show_top=n

adviz.value_counts_plus(
    colors,
    show_top=5)

Counts of data

  data count cum. count % cum. %
1 lightsteelblue 255 255 2.5% 2.5%
2 lavenderblush 254 509 2.5% 5.0%
3 mediumslateblue 249 758 2.4% 7.4%
4 indigo 246 1,004 2.4% 9.8%
5 lightgreen 243 1,247 2.4% 12.2%
6 Others: 8,993 10,240 87.8% 100.0%

Remove styling: style=False

adviz.value_counts_plus(
    colors,
    show_top=5,
    style=False)
data count cum_count perc cum_perc
0 lightsteelblue 255 255 0.024902 0.024902
1 lavenderblush 254 509 0.024805 0.049707
2 mediumslateblue 249 758 0.024316 0.074023
3 indigo 246 1004 0.024023 0.098047
4 lightgreen 243 1247 0.023730 0.121777
5 Others: 8993 10240 0.878223 1.000000

Styled vs non-styled tables

  • Indexing starts with 1 to be more accessible to non-tech audience
  • Indexing remains 0-based in non-styled tables for further processing
  • Column headers are displayed in a more readable way

Counts of data

  data count cum. count % cum. %
1 lightsteelblue 255 255 2.5% 2.5%
2 lavenderblush 254 509 2.5% 5.0%
3 mediumslateblue 249 758 2.4% 7.4%
4 indigo 246 1,004 2.4% 9.8%
5 lightgreen 243 1,247 2.4% 12.2%
6 Others: 8,993 10,240 87.8% 100.0%
data count cum_count perc cum_perc
0 lightsteelblue 255 255 0.024902 0.024902
1 lavenderblush 254 509 0.024805 0.049707
2 mediumslateblue 249 758 0.024316 0.074023
3 indigo 246 1004 0.024023 0.098047
4 lightgreen 243 1247 0.023730 0.121777
5 Others: 8993 10240 0.878223 1.000000

Change the size of the table: size

adviz.value_counts_plus(
    colors,
    size=5)

Counts of data

  data count cum. count % cum. %
1 lightsteelblue 255 255 2.5% 2.5%
2 lavenderblush 254 509 2.5% 5.0%
3 mediumslateblue 249 758 2.4% 7.4%
4 indigo 246 1,004 2.4% 9.8%
5 lightgreen 243 1,247 2.4% 12.2%
6 nan 240 1,487 2.3% 14.5%
7 plum 237 1,724 2.3% 16.8%
8 burlywood 237 1,961 2.3% 19.2%
9 lightseagreen 237 2,198 2.3% 21.5%
10 deeppink 236 2,434 2.3% 23.8%
11 Others: 7,806 10,240 76.2% 100.0%

Change the size of the table: size

adviz.value_counts_plus(
    colors,
    size=20)

Counts of data

  data count cum. count % cum. %
1 lightsteelblue 255 255 2.5% 2.5%
2 lavenderblush 254 509 2.5% 5.0%
3 mediumslateblue 249 758 2.4% 7.4%
4 indigo 246 1,004 2.4% 9.8%
5 lightgreen 243 1,247 2.4% 12.2%
6 nan 240 1,487 2.3% 14.5%
7 plum 237 1,724 2.3% 16.8%
8 burlywood 237 1,961 2.3% 19.2%
9 lightseagreen 237 2,198 2.3% 21.5%
10 deeppink 236 2,434 2.3% 23.8%
11 Others: 7,806 10,240 76.2% 100.0%

Sort Others:

  • Are other values significant?
  • What would the data look like if all others were in their sorted order?
adviz.value_counts_plus(
    colors,
    sort_others=True)

Counts of data

  data count cum. count % cum. %
1 Others: 7,806 7,806 76.2% 76.2%
2 lightsteelblue 255 8,061 2.5% 78.7%
3 lavenderblush 254 8,315 2.5% 81.2%
4 mediumslateblue 249 8,564 2.4% 83.6%
5 indigo 246 8,810 2.4% 86.0%
6 lightgreen 243 9,053 2.4% 88.4%
7 nan 240 9,293 2.3% 90.8%
8 plum 237 9,530 2.3% 93.1%
9 burlywood 237 9,767 2.3% 95.4%
10 lightseagreen 237 10,004 2.3% 97.7%
11 deeppink 236 10,240 2.3% 100.0%

Counting continuous data

  • Make continuous data discrete by binning them with pandas.cut.

  • Example: Count countries’ life expectancy by binning them under ten-year intervals.

gm = px.data.gapminder().query('year == 2007')
gm['lifeExp_bin'] = pd.cut(gm['lifeExp'], range(0, 100, 10))
gm.sort_values('lifeExp', ascending=False).head(10)
country continent year lifeExp pop gdpPercap iso_alpha iso_num lifeExp_bin
803 Japan Asia 2007 82.603 127467972 31656.06806 JPN 392 (80, 90]
671 Hong Kong, China Asia 2007 82.208 6980412 39724.97867 HKG 344 (80, 90]
695 Iceland Europe 2007 81.757 301931 36180.78919 ISL 352 (80, 90]
1487 Switzerland Europe 2007 81.701 7554661 37506.41907 CHE 756 (80, 90]
71 Australia Oceania 2007 81.235 20434176 34435.36744 AUS 36 (80, 90]
1427 Spain Europe 2007 80.941 40448191 28821.06370 ESP 724 (80, 90]
1475 Sweden Europe 2007 80.884 9031088 33859.74835 SWE 752 (80, 90]
767 Israel Asia 2007 80.745 6426679 25523.27710 ISR 376 (80, 90]
539 France Europe 2007 80.657 61083916 30470.01670 FRA 250 (80, 90]
251 Canada Americas 2007 80.653 33390141 36319.23501 CAN 124 (80, 90]

Counts of continuous data

adviz.value_counts_plus(
    gm['lifeExp_bin'],
    name='Life Expectancy - 2007',
    background_gradient='RdBu')

Counts of Life Expectancy - 2007

  Life Expectancy - 2007 count cum. count % cum. %
1 (70, 80] 70 70 49.3% 49.3%
2 (50, 60] 24 94 16.9% 66.2%
3 (40, 50] 18 112 12.7% 78.9%
4 (60, 70] 16 128 11.3% 90.1%
5 (80, 90] 13 141 9.2% 99.3%
6 (30, 40] 1 142 0.7% 100.0%
7 (0, 10] 0 142 0.0% 100.0%
8 (10, 20] 0 142 0.0% 100.0%
9 (20, 30] 0 142 0.0% 100.0%
  • Seventy countries (49.3%) have a life expectancy in (70, 80].

Change the theme of the table

adviz.value_counts_plus(
    colors,
    background_gradient='magma')

Counts of data

  data count cum. count % cum. %
1 lightsteelblue 255 255 2.5% 2.5%
2 lavenderblush 254 509 2.5% 5.0%
3 mediumslateblue 249 758 2.4% 7.4%
4 indigo 246 1,004 2.4% 9.8%
5 lightgreen 243 1,247 2.4% 12.2%
6 nan 240 1,487 2.3% 14.5%
7 plum 237 1,724 2.3% 16.8%
8 burlywood 237 1,961 2.3% 19.2%
9 lightseagreen 237 2,198 2.3% 21.5%
10 deeppink 236 2,434 2.3% 23.8%
11 Others: 7,806 10,240 76.2% 100.0%

Get the reverse of the theme by adding _r

adviz.value_counts_plus(
    colors,
    background_gradient='magma_r')

Counts of data

  data count cum. count % cum. %
1 lightsteelblue 255 255 2.5% 2.5%
2 lavenderblush 254 509 2.5% 5.0%
3 mediumslateblue 249 758 2.4% 7.4%
4 indigo 246 1,004 2.4% 9.8%
5 lightgreen 243 1,247 2.4% 12.2%
6 nan 240 1,487 2.3% 14.5%
7 plum 237 1,724 2.3% 16.8%
8 burlywood 237 1,961 2.3% 19.2%
9 lightseagreen 237 2,198 2.3% 21.5%
10 deeppink 236 2,434 2.3% 23.8%
11 Others: 7,806 10,240 76.2% 100.0%

Remove missing values: dropna

adviz.value_counts_plus(
    colors,
    dropna=True)

Counts of data

  data count cum. count % cum. %
1 lightsteelblue 255 255 2.5% 2.5%
2 lavenderblush 254 509 2.5% 5.1%
3 mediumslateblue 249 758 2.5% 7.6%
4 indigo 246 1,004 2.5% 10.0%
5 lightgreen 243 1,247 2.4% 12.5%
6 plum 237 1,484 2.4% 14.8%
7 burlywood 237 1,721 2.4% 17.2%
8 lightseagreen 237 1,958 2.4% 19.6%
9 red 236 2,194 2.4% 21.9%
10 deeppink 236 2,430 2.4% 24.3%
11 Others: 7,570 10,000 75.7% 100.0%

Use different symbols for thousands and decimal

adviz.value_counts_plus(
    colors,
    background_gradient='summer',
    thousands='.',
    decimal=',')

Counts of data

  data count cum. count % cum. %
1 lightsteelblue 255 255 2,5% 2,5%
2 lavenderblush 254 509 2,5% 5,0%
3 mediumslateblue 249 758 2,4% 7,4%
4 indigo 246 1.004 2,4% 9,8%
5 lightgreen 243 1.247 2,4% 12,2%
6 nan 240 1.487 2,3% 14,5%
7 plum 237 1.724 2,3% 16,8%
8 burlywood 237 1.961 2,3% 19,2%
9 lightseagreen 237 2.198 2,3% 21,5%
10 deeppink 236 2.434 2,3% 23,8%
11 Others: 7.806 10.240 76,2% 100,0%

Rename the data column

adviz.value_counts_plus(
    colors,
    background_gradient='cool_r',
    name='colors')

Counts of colors

  colors count cum. count % cum. %
1 lightsteelblue 255 255 2.5% 2.5%
2 lavenderblush 254 509 2.5% 5.0%
3 mediumslateblue 249 758 2.4% 7.4%
4 indigo 246 1,004 2.4% 9.8%
5 lightgreen 243 1,247 2.4% 12.2%
6 nan 240 1,487 2.3% 14.5%
7 plum 237 1,724 2.3% 16.8%
8 burlywood 237 1,961 2.3% 19.2%
9 lightseagreen 237 2,198 2.3% 21.5%
10 deeppink 236 2,434 2.3% 23.8%
11 Others: 7,806 10,240 76.2% 100.0%

Convert to raw HTML to_html

html_table = adviz.value_counts_plus(
    colors,
    background_gradient='winter_r',
    thousands='.',
    decimal=',',
    name='Colors'
).to_html()

print(html_table[:600])
<style type="text/css">
#T_9b1a8_row0_col1, #T_9b1a8_row0_col2, #T_9b1a8_row0_col3, #T_9b1a8_row0_col4, #T_9b1a8_row1_col1, #T_9b1a8_row1_col3, #T_9b1a8_row2_col1, #T_9b1a8_row2_col3, #T_9b1a8_row3_col1, #T_9b1a8_row3_col3, #T_9b1a8_row4_col1, #T_9b1a8_row4_col3, #T_9b1a8_row5_col1, #T_9b1a8_row5_col3, #T_9b1a8_row6_col1, #T_9b1a8_row6_col3, #T_9b1a8_row7_col1, #T_9b1a8_row7_col3, #T_9b1a8_row8_col1, #T_9b1a8_row8_col3, #T_9b1a8_row9_col1, #T_9b1a8_row9_col3 {
  background-color: #00ff80;
  color: #000000;
}
#T_9b1a8_row1_col2, #T_9b1a8_row1_col4 {
  background-color: #00f982;
  color: #000000

Get started now:





python3 -m pip install adviz


Explore more advertools data visualizations