Filtering the clipboard using UNIX filters

| 🤔 | 👍 | 👎 |

I had a few Joomla posts that I wanted to clean up semi-automatically. Here are a few scripts, to pass the content of the clipboard (or the current selection) through a UNIX filter.

Filter for cleaning HTML posts

Cleaning up the (HTML) content of the posts was quite time consuming and very repetitive:

  • removing style attributes (hardcoded fonts…);

  • splitting <p> containing <br/> in different <p>s;

  • removing empty <p>s;

  • some additional manual fixes.

Most of the job could be done by a script (cleanup_html):

#!/usr/bin/env ruby
# Remove some crap from HTMl snippets.

require "nokogiri"

if (ARGV[0])
  html = File.read(ARGV[0])
else
  html = $stdin.read
end
doc = Nokogiri::HTML::DocumentFragment.parse html

# Remove 'style':
doc.css("*[style]").each do |node|
  style = node.attribute("style")
  node.remove_attribute("style")
  $stderr.puts "Removed style: #{style}\n"
end

# Remove useless span:
doc.css("span").each do |span|
  $stderr.puts "Unwrapping span: #{span}\n"
  span.children.each do |x|
    span.before(x)
  end
  span.remove
end

# Split paragraphs on <br/>:
doc.css("p > br").each do |br|
  p = br.parent

  # Clone
  new_p = p.document.create_element("p")
  p.children.take_while{ |x| x!=br }.each do |x|
    new_p.add_child x
  end
  p.before(new_p)

  br.remove
end

# Remove empty paragraphs:
doc.css("p").each do |node|
  if node.element_children.empty? && /\A *\z/.match(node.inner_text)
    node.remove
  end
end

print doc.to_html

Filtering the clipboard or selection

I wanted to do a semi-automatic update in order to have feedback on what was happening and fix the remaining issues straightaway. To do this, the filter can be applied on the X11 clipboard:

#!/bin/sh
xclip -out -selection clipboard | filter_html | xclip -in -selection clipboard

It is even possible to do it on the current selection:

#!/bin/sh
sleep 0.1
xdotool key control+c
sleep 0.1
xclip -out -selection clipboard | filter_htm | xclip -in -selection clipboard
xdotool key control+v

This second script is quite hackish but it kind of works:

  • the application must use Control-c and Control-v for copy/paste;

  • the weird sleep calls are needed.

This can be generalized with this script (gui_filter):

#!/bin/sh

mode="$1"
shift

case "$mode" in
    primary | seconday | clipboard)
        xclip -out -selection "$mode" | command "$@" | xclip -in -selection "$mode"
        ;;
    selection)
        # This is an horrible hack.
        # It only works for C-c/C-v keybindings.
        sleep 0.1
        xdotool key control+c
        sleep 0.1
        xclip -out -selection clipboard | command "$@" | xclip -in -selection clipboard
        xdotool key control+v
        ;;
esac

Called with:

# Clean the HTMl markup in the clipboard:
gui_filter clipboard html_filter

# Base-64 encode the current selection:
gui_filter selection base64

# Base-64 decode the current selection:
gui_filter selection base64 -d

Binding it to a key

Now we can bind this command to a temporary global hotkey with this script based on the keybinder library:

#!/usr/bin/env python
# Bind a global hotkey to a given command.
# Examples:
#   keybinder '<Ctrl>e' gui_filter selection base64
#   keybinder '<Ctrl>X' xterm

import sys
import gi
import os
import signal

gi.require_version('Keybinder', '3.0')
from gi.repository import Keybinder
from gi.repository import Gtk

def callback(x):
    os.spawnvp(os.P_NOWAIT, sys.argv[2], sys.argv[2:])

signal.signal(signal.SIGINT, signal.SIG_DFL)
Gtk.init()
Keybinder.init()
Keybinder.bind(sys.argv[1], callback);
Gtk.main()

The kotkey is active as long as the keybinder process is not killed.

Conclusion

keybinder '<Ctrl>e' gui_filter selection html_filter
keybinder '<Ctrl>e' gui_filter selection kramdown
keybinder '<Ctrl>e' gui_filter selection cowsay
keybinder '<Ctrl>e' gui_filter selection sort

# More dangerous:
keybinder '<Ctrl>e' gui_filter clipboard bash
keybinder '<Ctrl>e' gui_filter clipboard ruby
keybinder '<Ctrl>e' gui_filter clipboard python

Other solutions

With Emacs

On Emacs, the shell-command-on-region command (bound to M-|) can be used to pass the current selection to a given command: by default the output of the command will be pushed on the ring buffer. Alternatively, C-u M-| can be used to replace the selection.

With Vim

The ! command can be used to transform a given part of the current buffer through a shell filter.

With atom

Atom can replace filter the current selection through a pipe with the pipe package.