Visualize your command line history with word clouds

Wordcloud thumbnail

I’ve been wanting to do a simple analysis of which Linux commands I use the most. It made sense to do this as a word cloud visualization so I put together a Python script that will:

  1. Load in the user .bash_profile file to get any aliases that have been set up,
  2. Load in the .bash_history file to strip out command arguments from the commands,
  3. Look up any aliases that have been used and convert those into the mapped/assigned Linux commands (or script names if those are assigned),
  4. Generate the word cloud visualization and save to a PDF file.

I’ll walk through the steps with the Python code below.

Load package dependencies

Here are the Python packages used:

from os.path import expanduser # More reliable way to get user directory.
from wordcloud import WordCloud # For word cloud viz.
import matplotlib.pyplot as plt # For outputting the visualizations.
import time # For output filename.

If you don’t have the wordcloud package installed, you will need to install via the command line:
pip install wordcloud

Back to the Python code, we’ll need to get the user directory before we can load in the profile and history files:
home = expanduser("~") # Get user directory.

Text cleaning

We’ll also need some function for removing unwanted characters in front of some of the commands:

def clean_string(s):
remove_chars = {"./":"", "'":"", "\"":""}
 for x, y in remove_chars.items():
 s = s.replace(x, y)
 return s

We’ll also need to extract command(s) that are assigned to an alias because aliases are often unique to the developer’s naming style. The assigned commands for the alias will tell us more about which commands are used most often:
def parse_alias(s):
 commands = []
 splits = s.split("; ")
 for split in splits:
  commands.append(split.split(" ")[0])
 return commands

Load in the user .bash_profile file

Here we will load in the .bash_profile file that will contain any aliases configured for the user:

f = open("%s/.bash_profile" % home, "r") # Open bash profile.
bash_lines = f.read().splitlines() # Read in bash profile.
f.close()

Create an alias dictionary

With the profile loaded into the bash_lines object, we can loop through each line and create an alias dictionary that uses the alias as the dictionary key and the command string as the value:

aliases = {}
for i in bash_lines:
 if 'alias' in i:
  i = clean_string(i)
  i = i.replace("alias ", "") # Strip out alias text.
  alias = i.split("=")[0] # Get the alias command.
  right_string = i.split("=")[1] # Get the command string for the alias.
  aliases[alias] = right_string # Add alias and command string to dict.

We will use this dictionary to look up against when we go through the command line history file.

Load in the .bash_history file

f = open("%s/.bash_history" % home, "r") # Open bash history.
command_history = f.read().splitlines() # Read in bash history.
f.close()

Process the command line history

Loop through the command_history object. Get the first ‘word’ of the command line. (This will either be a Linux command such as ‘ls’ or an alias.) If the first word is in the alias dictionary, get all of the commands (and/or script calls) and add them to the all_commands list.

 command_count = {}
 all_commands = []
 for i in command_history:
  command_list = [] # For holding 1 or more commands.
  command = clean_string(i.split(" ")[0])
  if command in aliases:
   # An alias was used.
   for c in parse_alias(aliases[command]):
    command_list.append(c)
  else:
    command_list.append(command)

The following code for updating the command_count dictionary is not necessary, but I include it just-in-case. The reason I wrote this code is I forgot the wordcloud package will automatically count words and assign frequencies.
  for c in command_list:
    if c in command_count:
     command_count[c] = command_count[c] + 1
    else:
     command_count[c] = 1

We will need to append the command to our list of all commands used:
all_commands.append(c)

Generate the word cloud visualization

wordcloud expects a giant text file to be passed into the constructor. We will generate the text file by converting the Python list to a string:

text = ' '.join(all_commands)

Instantiate a wordcloud object:
wordcloud = WordCloud(width=1200, height=800, background_color='white',
 max_font_size=320, collocations = False).generate(text)

Set our output filename:
output_file = "wordcloud_%s.pdf" % time.strftime("%Y-%m-%d-%H%M%S")

Use matplotlib to generate the image and save as a PDF file:
f = plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.tight_layout(pad=0)
plt.show()

f.set_size_inches(9, 6)
f.savefig(output_file, dpi=300)

You should see something similar to the following:

Visualization of command line history with wordcloud.

The full source code is available on github.