You know the drill: The words you’re looking for can be in Slack, or GitHub, or Google Drive, or Google Sheets, or Zendesk, or…the list goes on. Searching through these silos is a common frustration. It should be frictionless, and this Steam Hose dashboard makes it that way.
It wasn’t my first rodeo. I started this journey in 1996 and have revisited the idea periodically. In 2018 I wrote about a version that was the classic example of The Dopey Thing That Could Possible Work: a webpage that lists search URLs for various services and visits each in its own tab. As silly as it sounds, it was pretty useful to get used to a bit, and not just by me.
Of course I wanted to use the underlying APIs, normalize the results and merge them into a common view. But the effort required to rip out all the APIs made this project more difficult than it was worth. If you’ve done this sort of thing before, most services provide search APIs as well as adapters for your favorite programming language. But each service will have its own way of calling the API, paging the results, and formatting them. These differences create friction that you must overcome in order to work with results consistently.
Advertising
However, when API feuds become frictionless, a lot becomes possible. Effective metasearch is one of them. Steampipe allows you to exit the activity of calling APIs, paging results, and unpacking JSON objects. It calls APIs for you and streams the results to database tables so you can focus entirely on working with the data. This solves the biggest problem you face when building a metasearch dashboard.
Convergence on a schema
The next challenge is to tie the search results to a common schema. SQL is a great environment to do this. The query that drives the dashboard shown in the screencast includes three stanzas that you don’t need to be an SQL Wizard to write. They all follow the same pattern as this for finding GitHub issues.
select 'github_issue' as type, repository_full_name || ' ' || title as source, to_char(created_at, 'YYYY-MM-DD') as date, html_url as link, substring(body from 1 for 200) || '...' as content from github_search_issue where $1 ~ 'github_issue' and query = 'in:body in:comments org:github ' || $2 limit $3
The items in blue are the column names of a database table, in this case. github_search_issue, one of the tables made by Steampipe’s GitHub plugin. The Steampipe hub makes it easy to inspect table column names and descriptions, and shows you examples of how to use table information.
Because retrieving data doesn’t require API calls and JSON unpacking, you can focus on higher-order search syntax, which is a lot to think about, plus the interesting (and fun!) challenge of mapping the source columns to a common schema.
Items in red are the column names that appear in the dashboard. For this dashboard, we have decided that each search result will be mapped to these five columns: kind, source, date, linkand contents. SQL’s AS clause allows each stanza to easily rename its columns to match the schema.
The full query
Here is the full query that drives the dashboard. There are three stanzas like the one above, each written as a CTE (common table expression) with parameters corresponding to the input variables. And there is almost nothing else! Each stanza queries an API-based table (slack_search, github_search_issue, googleworkspace_drive_my_file), selects (and perhaps transforms) columns, then aliases the results to match the schema. All that’s left for UNION are the three CTEs, which act as temporary tables, and order the results.
with slack as ( select 'slack' as type, user_name || ' in #' || (channel ->> 'name')::text as source, to_char(timestamp, 'YYYY-MM-DD') as date, permalink as link, substring(text from 1 for 200) as content from slack_search where $1 ~ 'slack' and query = 'in:#steampipe after:${local.config.slack_date} ' || $2 limit $3 ), github_issue as ( select 'github_issue' as type, repository_full_name || ' ' || title as source, to_char(created_at, 'YYYY-MM-DD') as date, html_url as link, substring(body from 1 for 200) || '...' as content from github_search_issue where $1 ~ 'github_issue' and query = ' in:body in:comments org:${local.config.github_org} ' || $2 limit $3 ), gdrive as ( select 'gdrive' as type, replace(mime_type,'application/vnd.google-apps.','') as source, to_char(created_time, 'YYYY-MM-DD') as date, 'https://docs.google.com/document/d/' || id as link, name as content from googleworkspace_drive_my_file where $1 ~ 'gdrive' and query = 'fullText contains ' || '''' || $2 || '''' limit $3 ) select * from slack union select * from github_issue union select * from gdrive order by date desc
Dashboards as code
Many dashboard systems can work with this query. You could, for example, connect Metabase, tableau, or another Postgres client to Steampipe and create the same kind of interactive dashboard as shown here. You would do this job in a low-code environment where widgets and settings are managed in a user interface. Vapor Pipes dashboard subsystem takes a different approach informed by its infrastructure as code (IaC) roots. API queries should be expressed in managed SQL code, like any other code, in version-controlled repositories. Dashboard widgets that display the results of these queries must also be expressed in code, and in this case, the language is that of Terraform. HCL.
Here is the HCL definition of the metasearch dashboard. It declares three types of hall to block: Sources (multiple selection), search_term (text), and max_per_source (single selection, which is the default). You can do a lot more with the hall block: in particular, you can fill it with the results of an SQL query, as shown in the documentation. It’s not necessary here, however.
the table block uses the query defined above and sets the parameters passed to it. the wrap guarantees that columns with a lot of text will be readable.
dashboard "metasearch" { input "sources" { title = "sources" type = "multiselect" width = 2 option "slack" {} option "github_issue" {} option "gdrive" {} } input "search_term" { type = "text" width = 2 title = "search term" } input "max_per_source" { title = "max per source" width = 2 option "2" {} option "5" {} option "10" {} option "20" {} } table { title = "search slack + github + gdrive" query = query.metasearch args = [ self.input.sources, self.input.search_term, self.input.max_per_source ] column "source" { wrap = "all" } column "link" { wrap = "all" } column "content" { wrap = "all" } } }
Again, there’s not much else to see here, and there shouldn’t be. Building dashboards as code shouldn’t require a lot of complex code, and it doesn’t.
No magic required
Just as you don’t need to be an SQL wizard to create new subqueries, you also don’t need to be an HCL wizard to add them to the dashboard. Do you want to add sources? There are dozens more plugins to choose from, with more added every month. Not all of them offer search, but many do, and they’re easy to find with (of course!) a Steampipe query.
select name html_url from github_search_code where query = 'search org:turbot org:francois2metz org:ellisvalentiner org:theapsgroup' and name ~ 'table' and name ~ 'search' order by name
In the Steam Pipe samples repo, we included the dashboard code shown here, along with an additional search stanza for Zendesk that we removed when our trial account expired. Have fun expanding this dashboard! If a search API you need isn’t already available, drop by our Slack Community and let us know. Someone might already be writing the plugin you need, or maybe you’d like to tackle it yourself. Each new plugin empowers anyone who can work with basic HCL plus SQL to use APIs like a pro and solve real problems.
Copyright © 2022 IDG Communications, Inc.