The first part of this post was about the basic setup and running Zookeeper, Drill and Hadoop HDFS. In this second part we will add a CSV file to HDFS and query it from Drill. At the bottom under "New Storage Plugin" enter hdfs in the textbox and click on "create". You will see a page titeled "Configuration". I filled in the information as shown below. What you can also do is to copy the configuration of an existing storage plugin (e.g. "dfs") and modify it for the HDFS configuration. Basically you only need to change the values for "connection" and (workspace) "location". We are now ready to query the Hadoop filesystem - everything is configured. But we have no files in HDFS yet. So lets copy a CSV file into hdfs. We will use the hdfs command -a script in /opt/hadoop/bin - to do this. Go back to the drill web ui and click at the top on "Query". Enter the following query (note the backticks at the beginning and end of the filename) and submit it. Because we have used "select * ..." Drill returns the rows as an array of fields. Let's do a slightly different query which displays nicer and only some of the fields: So we successfully queried the Hadoop filesystem using Drill.
0 Comments
Leave a Reply. |
AuthorUwe Geercken Categories
All
Archives
September 2020
|