MENU

SEMOSS User Guide

All you need to begin your journey to data wizardry using the SEMOSS analytics platform

Introduction

So, you've just downloaded SEMOSS. Now what?

Congratulations! You've just discovered SEMOSS, an innovative, open source application that allows users to solve challenging problems by exploring and uncovering connections in data, creating tailored visualizations, and running custom algorithms.

This quick start user guide will walk you through step-by-step what comes after you download SEMOSS on your computer, and how you can start making your first visualization in minutes. We'll show you the ropes on how to make a visualization, edit it to your liking, and share it with others.

By the end of this user guide, we will have taught you the entire SEMOSS analytics process, as diagrammed in this nifty graphic below:

SEMOSS Donut Chart

So, let's jump right into it!


Installation

The Basics: Installing SEMOSS

First things first, let's get SEMOSS installed on your machine.


Minimum Requirements

To run SEMOSS on your machine, you'll need Google Chrome. Check that you have this program and then we can move on to configuring SEMOSS on your computer.


Configuration Steps


To install SEMOSS on your computer:

  1. Download SEMOSS from our website (kudos, if you've already done that!)
  2. Unzip the downloaded file directly to your C drive, identified in a File Explorer window.
  3. Navigate to C:\SEMOSS in a File Explorer window.
  4. Locate the “startSEMOSS.bat” file and double click. A Tomcat and Windows command prompt window will open and display “Do you have R installed? y/n”. Enter “n” and R will be set up for you. If you have R installed follow the command line instructions. Finally, a new Google Chrome window should open SEMOSS.


Set up System Path to incorporate R (Windows Environment)

  1. Navigate to Environment Variables.
  2. In the System variables section, select Path. Click Edit.
  3. Go to the end of the variable value and append (Note the semicolons are necessary to separate values):
    ;C:\SEMOSS\semosshome\portables\R-Portable\App\R-Portable;C:\SEMOSS\semosshome\portables\R-Portable\App\R-Portable\library\rJava\jri\x64;C:\SEMOSS\semosshome\portables\R-Portable\App\R-Portable\bin\x64; javac -d java.library.path;

Environment Variables

Add new System variables for R (Windows Environment)

  1. At the bottom under System Variables click New
  2. In Variable Name enter R_LIBS
  3. In Variable Value enter C:\SEMOSS\semosshome\portables\R-Portable\App\R-Portable\library
  4. Click OK.
  5. Click New again
  6. In Variable Name enter R_HOME
  7. In Variable Value enter C:\SEMOSS\semosshome\portables\R-Portable\App\R-Portable
  8. Click OK.



How You'll Know You Did It Right

After double-clicking "startSEMOSS.bat", SEMOSS will open in a new Google Chrome Window.


Building a Viz

Getting Started: Building Your First Viz

Now that you've got SEMOSS open and running, we can get to what you've been really waiting for – exploring your data. Creating your first SEMOSS visualization is easy and quick.
First compile the data you want to work with in a .CSV or .XLSV file. After you've done so, let's look at how you can begin uncovering insights.



Uploading Data: Drag + Drop

SEMOSS includes a number of built-in graphs and charts that let you drag and drop which nodes and properties you want to investigate in greater detail.


To begin creating your visualization:

  1. Select "Add New Visualization" or "+" button at the top menu bar from the landing page.
  2. Select "Add from File or Raw Data".
  3. There are three options for quickly transferring your file:
    1. Choose "Browse Files" in the upper right corner of the window and navigate to the appropriate file
    2. Drag the file directly from your desktop or folder into the indicated space
    3. Copy and paste from another table into the blank space
  4. SEMOSS will return to the upload screen with the file that you selected. Click the "Next" button.
  5. SEMOSS will prompt you to categorize each column of data as "string", "numerical", or "date". SEMOSS will automatically populate this classification for you, but it is important to confirm that each column is represented accurately. Finish the upload process by clicking the "Load" button.
    1. A small yellow exaclamation point indicates that SEMOSS edited the column header name to make it cleaner and more accessible.
    2. Click the "X" button to hide the column completely.

Great! Now that your data is uploaded, we can move on to working with it through a SEMOSS visualization.


Creating Your First Viz


To begin visualizing your data:

  1. SEMOSS will populate your uploaded data in a grid table for you. Select a visualization type from the menu on the right-hand panel. For a full list of SEMOSS visualizations and some tips on choosing which type is best for your purposes, jump to the Visualization library section.
  2. Select the visualization type you want to work with. The toolbar will prompt you to select the columns you want to visualize that are necessary to produce the visualization.


And just like that, you've built your first visualization!

So what should you do if you want to customize it and make it your own? Glad you asked…


Choosing a Viz Type

SEMOSS includes a number of built-in visualization types that can be customized pretty extensively. You should experiment with different viz types, understanding that some types of data are better inclined towards certain types of visualizations. For example, categorical data can be useful for comparing multiple types of objects and is better displayed through a bar or pie chart. However, if you are more interested in using an arithmetic operator on categorical data, your choices may be more limited than if you had chosen to use numerical data (a.k.a. you might want to choose something like a line chart).

Skim through our viz library to determine which type is best for you:


Icon Output Description
Clustering A way to group instances based on some specific property
Bar A graph that groups data based on values selected
Grid A display of data in a tabular format with concepts as column headers
Heat Map A matrix of color hues that graphically represent the correlation between two concepts
Line A representation of concepts graphed against a selected variable over time
identify semoss folder Dendrogram A visualization that illustrates connections between concepts in a tree-diagram
identify semoss folder Parallel Coordinates A visualization that illustrates relationships across multiple concepts
identify semoss folder Pie A circular chart that shows the relative size of each value
identify semoss folder Scatter A representation of concepts graphed against selected variables
identify semoss folder Scatter Matrix A multi-dimensional scatter plot
identify semoss folder World Map A geographical representation of a data set
identify semoss folder Sunburst A circular chart that shows the relative size of each value
identify semoss folder Single Axis Cluster A way to plot values across an X axis based on a numeric value
identify semoss folder Force Graph A visual representation of the connections between concepts in a network diagram
identify semoss folder Gantt Chart A way to depict values changed over time in relation to amounts planned

So you've built your first viz, but you decide that you want to change the variables used or the viz type. Or maybe you want to change axis names or add some color to dazzle your audience. Let's run through some quick modification tutorials.


Changing Viz Type

To change the visualization type, simply select the viz type icon. If applicable, SEMOSS will prompt you to modify the input labels and values.



Changing Axis Names


To change the titles of your X and Y axis:

  1. Click the Toggle Menu and select "Edit Visualization".
  2. Click the current X or Y axis label. Enter the new title you'd like to use, choose a new text color for further customization, and click "Apply".


Changing Viz Colors


To alter the color scheme with your own colors or built-in color swatches:

  1. Click the Toggle Menu and select "Edit Visualization".
  2. Click individual visualization points (like bars on a bar chart) and select a color.
  3. Click the Toggle Menu and select "Color Visualization" to change the entire color theme. Choose a color palette or create your own.


Adding Comments


To annotate your visualization, take the following steps:

  1. Select the "Comment" option which has a speech bubble icon.
  2. Click the portion of the visualization you'd like to add a comment to.
  3. Enter your comment and select the "Submit Comment" button.


Saving your Viz

SEMOSS allows you to save your visualization as an "Insight" so that you can return to it later and share your findings with colleagues. When you save an Insight in SEMOSS, it will show up on your homepage in a feed under its parent database.

Throughout your data exploration process, you'll want to save your Insights regularly so that you can return to them for analysis later. Additionally, Insights that you save can be incorporated into a custom Dashboard or a Report.


To save your visualization into an Insight:

  1. Once you are satisfied with your visualization, select the floppy disk icon on the right hand side to save your visualization.
  2. Create a name for the Insight, and select which database it belongs to. Specify here if you'd like to add any filters.
  3. Click "Save As New" and to access this visualization at any time by selecting it in the Feed, or using the search toolbar.

Once you see the green notification box that says "Success" you'll know you did it right.



Reloading your Viz

Congratulations on successfully building and saving your first viz! Here's how to reload your visualization.

Every SEMOSS page includes a Search Bar at the top of the workspace that allows users to quickly find a saved Insight. This functionality searches through all Insight titles and perspective names to return matching results grouped by database. Users can filter by visualization type, database, and tags to more swiftly locate their Insights. Each Insight is tied to a specific database. The Search function has a continuous scroll feature, helping you browse through the repository of saved Insights.


To reload/reaccess your saved Insight:

  1. Begin typing the name of the visualization or database you are looking for in the search bar at the top of the console. After entering a search term, hit "Enter" to populate a list of related Insights.
  2. Select the visualization in the generated list.
  3. SEMOSS will open your selected Insight in a new pop-up window.
  4. Or reload it by double clicking on the visualization from the feed.

The Essentials: Tips and Tricks

Now you've mastered the basics. What's next?

SEMOSS basics are fairly straight-forward. Thankfully, there are even more pretty slick features that can faciltate your data analysis process and help you spend more time in your data, and less time trying to figure out how to make the data work for you.


Cleaning your data

SEMOSS now allows users to clean messy datasets faster to start generating insights. Supported by powerful R analytics routines, SEMOSS Cleaning Widget gives you a wide range of functions that will help you better visualize your data. Explore the Cleaning Widget with a few simple steps:


To access the Cleaning Widget:

  1. If you already have uploaded a database, click “Add New Insight” in the top-right of your UI. If not, click “Upload Data” and go through the upload process.
  2. Once you are satisfied with your metamodel, click “Visualize”.
  3. In the Toggle Menu, type “Clean” into the search bar and click on the icon. A new UI will open.
  4. The Clean UI has many useful functions to clean your data. To start, click on the header of a column to view a histogram of the columns. This may be helpful in determining naming discrepancies in your data or identifying null values.
  5. In the bottom pane, explore the other functions available to you. Some useful functions include renaming a column, dropping a column, and splitting a column based on a given delimiter. Click “execute routine” to run a given routine.
  6. When you are satisfied with your adjustments, you can close the menu and the new displayed grid will include all of your changes.


Cloning your Viz

Duplicating your SEMOSS visualization takes just one click, and can be useful when you want to make a small edit or check out a downstream effect without losing your original viz. Cloning is also great for building dashboards, if you want to see how moving different analytical mechanisms affect your data.


To duplicate your visualization:

  1. Once you have an open visualization, select the Toggle Menu, click "Tools", and select "Clone Visualization".
  2. The duplicated visual will open in another window.


Filtering

When used correctly, the Filter Tool is extremely useful. Filtering is great for going beyond holistic analysis by enabling deep dives into your data. By default, when you select a column to analyze, SEMOSS will look at the entire column. By using Filter, you can dive into the details about a certain instance or a group of instances. Use filtering tools to get a closer look at certain aspects of your data.


To filter down your visualization:

  1. Select the "Tools" option on the top of the right-hand toolbar.
  2. Select the "Filter" button which has the image of a filter on it.
  3. Choose the parameters you'd like to filter down into. If you want to look at just one or two instances, uncheck the "Select All" button at the top of the list.
  4. Click "Apply Filter" to update the visualization to reflect that selection.


Brush Mode

Use the "Brush" feature to zoom into your visual and get a closer look at your data.


To zoom in your data using the brush feature:

  1. Create a new visualization or pull an existing one up, and click on the Toggle Menu icon and click the "Brush Visualization Mode" option.
  2. Click the section of the visual to zoom in.
  3. To unfilter, click anywhere on the visualization.


Dashboards

Now that you're comfortable with building data Insights, modifying them to your liking, and saving them within SEMOSS to access and share them later, let's get into some of the more exciting SEMOSS capabilities, like Dashboards.



SEMOSS allows users to view multiple visuals at a single time through the Dashboards feature. Dashboards are helpful for display key metrics and performance indicators on one screen, and can be saved and shared like other SEMOSS visualizations. SEMOSS Dashboards are fully customizable and user-friendly, making it easier than ever to communicate your most important data Insights with stakeholders in a clean, consolidated manner.

Insights must be saved prior to viewing them in the Dashboard. If you decide you want to create a Dashboard while exploring your data, make sure that you save the Insight first before trying to populate it in a Dashboard.

To create a Dashboard:
  1. In a new Sheet, begin typing in the visualizations in the SEMOSS search bar you would like in your Dashboard. Select each visualization you would like to include in the Dashboard by clicking on it.
  2. The selected Insights should appear on the "Visualizations" list on the left-hand toolbar. They should also appear within your Sheet into different windows.
  3. On the Sheet tab, click on the Sheet icon to view a drop down menu. Select the "Dashboard" option.
  4. Click on the visualizations you would like included in your Dashboard on the left-hand menu.
  5. Each visualization will generate in a separate window, creating a Dashboard view. To save your Dashboard, click the floppy disk icon on the right-hand tool bar.
  6. Select a database to save your Dashboard to and give it a name. You can access this Dashboard through the SEMOSS Search Bar next time you would like to pull it up.


To join data within a Dashboard:

Users can also join data or link visualizations from the Dashboard by clicking the "Join Data" button in the upper right hand corner of the Dashboard.


  1. Open a Sheet containing the Dashboard you would like to add to by browsing it in the SEMOSS Search Bar, or creating a new one. Select "Join Data" in the top right-hand corner.
  2. By joining data, users can select concepts from each Insight and combine similar parameters to create a new visualization. Columns are joined by clicking and dropping the parameters into "Join Columns" box.
  3. When data is successfully joined, clicking on one instance of a visualization will highlight the same instance in all other visualizations in the Dashboard. Click "Save Join" and hit the green exit button to return to your Dashboard.


Reports

Another SEMOSS feature, Reports allows users to quickly and interactively generate a document containing embedded SEMOSS visuals. SEMOSS Reports are highly shareable and are useful for users who need to conduct multiple analyses that are quite similar. Report templates can be saved and repurposed for similar purposes in the future. The Reports window includes a panel of selected Insights that can be added to the Report with a simple click, making it simple to plug in visualizations as relevant within the context of the Report.

Reports use HTML code and users can type directly into their Report template as you would in a word processor. The style view of Reports allows users to use a word processing tool bar to create a document as they would in a program like Microsoft Word.

Reports can only embed visualizations that have been saved as Insights. If you are generating a visual on-the-fly that you would like included in your Report, be sure to save your Insight first.

To create a new Report:

  1. In a new Sheet, use the SEMOSS Search Bar to select the visualizations you would like to include in your Report.
  2. SEMOSS will open your visualizations in new windows on your Sheet.
  3. On the Sheet, select the Sheet icon and select "Report" from the drop down menu.
  4. SEMOSS will open a new window that includes your Report template. Select the visualizations you'd like to include in your Report by clicking on them on the left-hand panel.
  5. Your visualizations will show in the Report template. Use the Reports toolbar to customize your template and include any writing you would like included with your visualization.
  6. Additionally, an alternative way of embedding visualizations is to click on the "Source" icon to view the HTML code for your Report. This is where you will copy and paste embed codes from other visuals if you would also like them featured or referenced in the Report. Jump to "Embed" to learn how to extract iFrame code.
  7. After copying and pasting that iFrame embed code, view the changes by clicking "Source" and returning to Report editing pane.
  8. Edit and stylize your Report and click the floppy disk icon to save the Report, or the printer icon to print it.


Sharing your Insights

Since at this point you've begun to uncover some interesting patterns and points of interest in your data, you're probably wondering how you can practically share what you've found with colleagues and stakeholders.



By hosting your SEMOSS database on a local server, anyone with access to the server can use your database and explore saved Insights. This is a great option for organizations that need to share database capabilities across an enterprise.

You can also save individual visualizations and Insights in various file forms and share them as you would any other file (email, .ZIP file, USB, etc.) with colleagues and stakeholders. Check out how to do that below.

To export your visualization as a .SVG file:
  1. On the menu bar, locate the "Save as SVG" icon which looks like a little picture file.
  2. Your data will download as a .SVG file. Double click to open the file.

To export your visualization as a .CSV file:
  1. On the menu bar, locate the "Export to CSV" icon which is an arrow pointing upwards.
  2. Your data will download as a .CSV file. Double click to open the file.

To transfer your data as a .ZIP file:

Another pragmatic method of sharing your insights is to locate your database folder and .SMSS file, .ZIP it up, and send it to your intended recipient through email or other file transfer.



  1. Locate your SEMOSS workspace folder. Within the "SEMOSS" folder, select "db" and find the database you are working with.
  2. Locate the .SMSS file associated with that database.
  3. Select the folder and the .SMSS file, right click and using the .ZIP program of your choice, zip the files together.
  4. Transfer the .ZIP file to your recipient, instructing them to drag and drop the database folder and corresponding .SMSS file by finding their Workspace/SEMOSS/db.

Pause: A Quick Recap

Phew, we've gone through a lot in one sitting! Let's review what we've done so far:


  • Built a new database and structured relationships in your data
  • Created your first visualization
  • Modified it using the SEMOSS toolbar
  • Cloned it, filtered it, and then built a dashboard
  • Made a sample report
  • Shared this viz with others

We're almost there, but not quite done in showing off some of the other cool features that SEMOSS offers in enhancing your data analysis.


Selecting New Data to Work With

When you upload data through the "Create Visualization" feature in SEMOSS, it conveniently saves that data for you in the form of a database. You can access that data by clicking the "Homepage" or "Feed" tab, and scrolling until you see the name of your datasheet and the date. Any Insights you saved will be there as well.


To create a visualization selecting new data:
  1. Identify the toolbar in the top right corner. Click the "+" or "Add New Visualization" button to begin.
  2. A new window will open that will allow you to select with database and data you want to work with. Click the database you'd like to use.
  3. Your database Metamodel will show in the window. Click each column name by selecting the box it's in or the small "+" sign by the name to add the components you want to work with.
  4. You can filter data before loading it into a new visualization.

There are a number of ways you can make your analysis even more dynamic, including adding new data and selecting new components to work with. Let's dive into that in the next section below.



Add New Data to an Existing Database


To add data to an existing database:
  1. Click on "Upload Data" in the right-hand menu. In New Visualization panel, select the "Add Data to Existing Database" button.
  2. Select which database to add the data to.
  3. Drag and drop the file, scrape a URL, or upload with text.
  4. Proceed to set up Metamodel.


To connect to an external database:
  1. Click on Upload Data. In New Visualization panel, select the "Connect to External Database" option.
  2. Enter a name under the "New Database Name" prompt and select the "Database Type" - MySQL, Oracle, or Microsoft SQL Server. For connecting to an existing database, these three forms are supported. For creating a new database, RDBMS DB, Maria DB, and H2 are also supported.
  3. Enter details for establishing a connection to existing RDBMS DB.
    1. Hostname (on local machine 127.0.0.1 or localhost)
    2. Port (optional, by default it'll take 1433)
    3. Schema (the existing database name)
    4. Username and Password
  4. Click "Test Connection" to verify that SEMOSS connects to the database using the properties specified. If the connection is successful, click "Next" to create the Metamodel. All the concepts (tables in existing DB) will be displayed in the left side, which can be expanded to see their respective properties (columns for existing DB).
  5. Click "Add" to add concepts to the Metamodel and select the Primary Key for that concept (table) and click the "Accept" button.
  6. After all the concepts (tables) have been added, make connections and select the properties (columns) to be joined.
  7. Click Upload after the Metamodel has been defined. A successful upload message will be displayed if the DB was successfully connected with SEMOSS.

Linking Multiple Datasets to Create a Viz

SEMOSS facilitates cross-dataset analysis with the federation feature. Data federation allows users to work with multiple datasets at once, expanding your understanding of your data and offering greater context/awareness.

There are two ways to federate data in SEMOSS. The first is by joining data by column to create an Insight, and the second is to build a dashboard to get a single view of multiple Insights.

In order to federate data from multiple sources to generate an Insight, each database that you are drawing from must be saved in SEMOSS first.

Next, once you've confirmed that all databases you'd like to use are uploaded, proceed through the following steps.

To create an Insight from multiple data sources:

  1. Select the "Add New Visualization" or "+" button.
  2. Select the first database that you will be accessing.
  3. SEMOSS will open the Metamodel for that database. Select the nodes on the Metamodel that you want to work with in creating your Insight.
  4. After highlighting each node, jump to the "View Metamodel" window. You'll noticed that the Metamodel components you highlighted also appear in this window.
  5. On the left-panel, select "Select DB". Highlight the second database you would like to incorporate into your Insight. SEMOSS will join the data by column and match by instance.
  6. On the left-panel, scroll through the "Select Properties" window and identify the columns/variables you'd like to include in the Insight. SEMOSS will add that node into your Metamodel, and you can see that in the Metamodel selection window.
  7. Click "Visualize" when you have selected all relevant nodes for analysis.
  8. In the visualization menu, select the "Grid" option. Scroll to the right to see your new column appended in the grid.
  9. From here, you can create an Insight as you normally would.

Running Analytics: Built-In Routines

You've got the viz down. Ready to graduate to Analytics?

SEMOSS enables on-the-fly analytics, allowing users to find Insights for any dataset with agility and ease. Standard analytic routines are provided within the platform, but can be supplemented with custom-developed algorithmic and statistical routines.

Choose any of the following built-in routines to get started:

Routine Description & Use
Clustering
  • Divides the dataset into partitions (clusters) so that the items inside each cluster are similar to each other. Used to cluster people, places, or things to highlight trends and similar tendencies. Past examples include:
  • Identifying high risk patients using similarity in health metrics
  • Finding security threats at an early stage based on people's associations or related attributes
  • Facilitating networking and collaboration among people with similar interests and involvements
  • Identifying technologies with similar functions or capabilities to improve collaboration
  • Fast Outliers/Outliers
  • Determines outliers by calculating the probability that each instance is an outlier based upon the data provided
  • Identifying high risk patients using similarity in health metrics
  • Finding security threats at an early stage based on people's associations or related attributes
  • Facilitating networking and collaboration among people with similar interests and involvements
  • Identifying technologies with similar functions or capabilities to improve collaboration
  • Similarity
  • Evaluates how similar a single item is to the overall data set by determining the most frequent values for each attribute
  • Determines degree of variance across the instances
  • Associated Learning
  • Mines Data to identify trends or rules that hold true for most, if not all, entries
  • Outputs if-then statements showing if an attribute meets the provided criteria, then another attribute will have a specified value
  • Predictive Classification
  • Mines Data to identify trends or rules that hold true for most, if not all, entries
  • Generates a mapping that anticipates responses to current questions using similar historical situations and their outcomes. Used to predict outcomes by comparing data to relevant historical situations. Past examples include:
  • Identifying exceptional people or products by analyzing the traits of past success stories
  • Making educated investment decisions by understanding projects that are likely to produce high, positive ROIs
  • Anticipating imminent project issues or failures that employ similar methods and techniques with previously failed issues
  • Designing solutions with positive results by evaluating what qualities or attributes are most important for success
  • Matrix Regression
  • Calculates the line of best fit to approximate the input variable
  • Numerical Correlation
  • Compares numerical attributes, one-to-one, to determine the degree of correlation between the two fields and ultimately relations between variables
  • Produces a correlation value and plot for each one-to-one pairing of attributes
  • Classification
  • Identifies trends prevalent in the data and attributes that exhibit highest entropy and thus are most valuable in identifying similarity
  • Groups similar entities using available data, both numerical and categorical

  • To apply an analytics routine:
    1. Select the "Add New Visualization" or "+" button.
    2. Select the database you want to explore and the corresponding Metamodel components.
    3. SEMOSS will open a new visualization window. Select the grid option.
    4. In the “Toggle Menu” handle, click the “Analytics” tab and then click “Run Analytical Routine”
    5. Select a routine, customize parameters, and click "Run Algorithm". The "i" icon will give you more information about the analytics routine.
    6. Some routines directly output a visual. Others add a new column to the table.
    7. If no visual is generated, select the visual panel from the Toggle Menu to create your own with the new data field.

    Scripting in SEMOSS

    And now for the fun stuff - advanced data manipulation and analysis using scripting

    If you're looking for a one-stop data analytics shop using scripting, do we have some good news for you. SEMOSS allows you to use Java, R, and PKQL to build custom queries in your dataset.


    SEMOSS Architecture

    SEMOSS structures your data by placing it in a dataframe for querying. A dataframe is where your data is stored and what SEMOSS uses to gather the data you've asked it to analyze. When you make a visualization, SEMOSS goes into your database to grab the data you select and stores it in of these dataframes for processing. When you work with a saved Insight, SEMOSS collects the data from its respective dataframe.
    To select the dataframe type when creating a new visualization, select "Advanced Options" within the "Add a New Visualization" window. When you're already knee-deep in creating a visualization, use PKQL to switch between frames.

    • H2: Uses a grid structure to organize data. Data will be run through numerical processing and is formatted more like your traditional Excel spreadsheet. This is the default option in SEMOSS.
    • TinkerFrame: Uses to traverse between data nodes in a graph structure. TinkerFrame is faster than H2, better for qualitative data, and graph algorithms.
    • R: Best for analytics, and super-users. For simplified analyses, H2 would be sufficient. R is great for expert users who are already familiar with the language.

    As the user, you should care because the dataframe can influence how efficiently you can work with your data in creating an Insight.

    For database type, SEMOSS offers the following options:

    • Relational Database Management System (RDBMS) (H2, MariaDB, SQL Server): Constructed with tables, queried using SQL and limited to data in the database, this option is for tables that each contain separate sets of data, where columns are properties of the category and rows are individual data entries. Additionally, each table has a Primary Key, which is a unique identifier and can be used to compare across tables. The tables are selected and joined by constraining data. This method can be inefficient with lots of tables in a large query.
    • Resource Description Framework (RDF): Constructed with triples, queried using SPARQL, RDF provides essentially unlimited data, but is harder to visualize as a "database". This option uses "resources" rather than tables. The data triples each describe something and are in the Subject-Predicate-Object form. In this structure, each table stores a different category of triples. This method is easier to obtain data based on abstract questions or ideas, and predicates can be added to the database without changing schemas or anything else but size. An RDF can be difficult to execute for complex queries.

    R Integration

    SEMOSS is compatible with R and allows users to run R scripts within SEMOSS. R is a powerful analytics package (that is also open source!) you might consider using to explore your data.


    In using R, you will want to use R packages to run various techniques and manipulate your data. Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. Installing additional packages allows for more functionality using R.


    Semoss installs the following packages by default:
    1. Rserve
    2. RJDBC
    3. splitstackshape
    4. data.table
    5. reshape2

    To install additional packages into Semoss
    1. Navigate to C:\SEMOSS\semosshome\portables\R-Portable\SemossConfigR\scripts
    2. Open Packages.R and add your new library
    3. Navigate to C:\SEMOSS\semosshome\portables\R-Portable\SemossConfigR
    4. Run configureR-Portable.bat
    5. Navigate to C:\SEMOSS\semosshome\portables\R-Portable
    6. Run startRserve.bat
    7. Restart Semoss

    To run R in SEMOSS:
    1. Open SEMOSS. Select the "Add New Visualization" or "+" button.
    2. Select the database you want to explore and the corresponding Metamodel components.
    3. SEMOSS will open a new visualization window. From the menu open Terminal.
    4. In the new window, select the "Java" icon at the bottom.
    5. In the Java Console type: synchronizeGridToR(“dataFrameName”);
    6. Switch to the R console and use the dataFrameName you synchronized to manipulate data in R.
    7. Type your R scripts in the R window and hit "Submit". Note that SEMOSS will tell you if you have entered in an invalid script.

    You can load specific columns from your dataframe into R by typing:api:R.query([c:column1, c:column2, etc.]);


    Note that hyphens should be removed when referring to a column/header title and that some words are restricteed and converted from the loeader to the database name (for example, "Date" may become "Date_1").


    PKQL

    PKQL is a custom developed query language that allows users to create and customize an Insight through a series of scripts. SEMOSS uses PKQL for a number of reasons, including that it is: easy to learn; repeatable, editable, reusable, and shareable; does not require the need to know multiple query languages; and is compatible with packages from other languages such as R, Java, Scala, etc.

    Every time you click a button in SEMOSS, PKQLs are working in the background to execute that action in SEMOSS. Learning PKQL code can save you time and energy, and may be a worthy endeavor if you anticipate running numerous routines and patterns, saving you click-time and instantly generating your intended outputs.

    The console section can be used to view or enter PKQL commands. PKQL commands can be written in this panel or copy/pasted from your previous sessions or from a colleague. All PKQL commands must end with a semicolon.

    To run a PKQL query:

    1. Open SEMOSS. Select the "Add New Visualization" or "+" button.
    2. Select the database you want to explore and the corresponding Metamodel components.
    3. Select the Toggle Console from the right-hand toolbar.
    4. Type your PKQL script in the window and hit "Submit".


    In writing a PKQL command, suggested commands will show when you start typing. For example, upon entering "m:" and you'll see a list of math commands you can use. Every PKQL command ends with a semi-colon (;).

    Data can be pulled into an in-memory data frame using: "data.import( api:DB.query ( [cols], ( filters ), ( joins ) ), ( table joins ) );"



    Where:
    • DB is the name of database to pull from
    • cols is a list of columns to pull from database
    • filters (optional) are a list of filters to include on the columns
    • joins list how to pull relations (inner, outer)
    • table joins used when you have multiple data.import calls or pulling from multiple databases to explicitly state the mapping

    PKQL Examples


    Action Steps Example PKQL
    Import data to use in existing DB Specify which columns to include in data set data.import ( api:Movie_DB . query ( [ c:Title, c:Studio, c:Title__RevenueDomestic, c:Title__RevenueInternational ] , ( [ c: Title , inner.join , c: Studio ] , [ c: Title , inner.join, c: Title__RevenueDomestic] , [ c: Title , inner.join, c: Title__RevenueInternational ] ) ) ) ;
    Import data from CSV Specify that the load is from a CSV file then provide the location of the file on your computer data.import( api:csvFile.query ( [c:Title, c:Nominated, c:Studio], { 'file' : 'C:\Users\Desktop\Movie Data.csv' } ) );
    Pull a single column from a database Specify the column to pull data.import( api:Movie_DB.query ( [c:Title] ) ) ;
    Pull a column and its properties from a database Specify the column to pull and the properties. Property names must be concatenated with attribute they refer to with two underscores in between data.import( api:Movie_DB.query ( [c:Title, c:Title__RevenueDomestic, c:Title__RevenueInternational ] ) ) ;
    Pull 2+ columns from a database Specify the columns to pull and the type of join to use. Joins can inner.join, outer.join, etc. data.import( api:Movie_DB.query ( [ c:Title, c:Studio ], ( [c:Title, inner.join, c:Studio] ) ) ) ;
    Filter data by column Specify which column and parameter col.filter ( c:Studio = [ 'WB' ] ) ;
    Un-filter data by column Specify which column and parameter col.filter ( c:Studio = ['WB', 'Sony'] ) ; OR col.filter ( c:MovieBudget = [0] ) ;
    Filter data by range of numerical values Specify which column and numerical parameter col.filter ( c:MovieBudget < [1000000] ) ;
    Filter data during data import Insert filter section between the column names and joins data.import ( api: Movie_DB . query ( [ c: Title , c: Studio ] , ( c: Studio = ['WB' ] ) , ( [ c: Title , inner.join , c: Studio ] ) ) ) ;
    Add new calculated column Specify which columns to include and define equation col.add ( c:TotalRev, ( c:RevenueDomestic + c:RevenueInternational ) ) ;
    Create visual: General Specify which parameters to create the visualization Set visualization by panel in reference, the type of visualization being created, and list of the data to include. Use tools for custom settings panel[0].viz ( Column, [ c:Title, c:TotalRev ] ) ; OR panel[0].viz(Column, [c:Title, c:MovieBudget], {ascending:true, color:red});
    Create visual: Grid Show data in grid style panel[0].viz(Grid, []);
    Create visual: Pie Specify chart by Label and Size panel[0].viz(Pie, [c:Title, c:MovieBudget]);
    Create visual: Column Specify chart by Label, Value1, Value2, etc. panel[0].viz(Column, [c:Title, c:MovieBudget, c:DomesticRevenue]);
    Create visual: Line Specify chart by Label, xValue, yValue, zValue, Color panel[0].viz(Line, [c:Year, c:DomesticRevenue]);
    Create visual: Scatter Specify chart by Label, Value1, Value2, etc. panel[0].viz(Scatter, [c:Title, c:DomesticRevenue, c:InternationalRevenue, c:MovieBudget, c:Nominated]);
    Create visual: Parallel Coordinates Specify chart by Label, Value1, Value2, etc. panel[0].viz(ParallelCoordinates, [c:Title, c:Studio, c:Nominated, c:MovieBudget]);
    Create visual: Heat Map Specify chart by xValue, yValue, Heat col.add(c:TitleCount, m:Count([c:Title], [c:Director, c:Nominated]));
    Customize Visual Specify bar type and color for existing visualization panel[0].lookandfeel ( { 'bar-col-2-index-0' : {'editable-bar' : '#fb5e59' } } ) ;
    Create 2nd visual Specify which visualization type and parameters panel[1].viz ( Scatter, [ c:Title, c:RevenueDomestic, c:RevenueInternational, c:TotalRev ] ) ;
    Express a single value Specify bar type and color for existing visualization panel[0].lookandfeel ( { 'bar-col-2-index-0' : {'editable-bar' : '#fb5e59' } } ) ;
    Add a new column Specify the name of the new column being created, and define how to calculate that column Define functions using +, -, *, and /. These functions can be used in conjunction with each other Aggregate values based on a specific parameter such as count values, sum, average, etc. col.add(c:TotalRev, (c:RevenueDomestic + c:RevenueInternational)); OR col.add(c:PercOfBudget, (c:MovieBudget / (m:Sum([c:MovieBudget])))); OR col.add(c:NumMoviesWithSameDirector, (m:Count([c:Title], [c:Director]))); OR col.add(c:MovieBudgetAverageOnStudio, (m:Average([c:MovieBudget], [c:Studio])));

    Recipe

    Any time you create a new visualization or edit an existing one, the left-hand "Recipe" panel allows you to see a quick log of the actions of you've taken and the corresponding PKQL script that was generated on the backend to operationalize that action. A recipe is essentially a string of PKQLs. Open the Toggle Panel in the PKQL window to copy reproducible PKQL code.

    The recipe panel is on the left-hand side of the SEMOSS platform as the second option under each Tab. Play your steps to retrace your analysis, and click the arrow button to collapse the menu.

    Java

    If you're really savvy, use Java to manipulate your data and run commands. There are many documentation sources you can access to learn more about Java syntax, but we'll cover the very basics that might be pertinent to your SEMOSS use.

    Remember that Java is case-sensitive, and spaces are underscores.

    Basic variable types you can use include:

    • byte (number)
    • short (number)
    • int (number)
    • long (number)
    • float (float number)
    • double (float number)
    • char (a character)
    • boolean (true or faflse)

    Some example functions you could use in Java would be "Pivot" or "RemoveNode" – more functions are being added to SEMOSS daily, so stay tuned to the ever-growing list of Java functions you can incorporate into your analysis.v

    That being said, how do you get started deploying Java in SEMOSS? If you've walked through our sections on R and PKQL, opening the Java console and running a command through these steps will look familiar.

    To use Java in SEMOSS:

    1. Open SEMOSS. Select the "Add New Visualization" or "+" button.
    2. Select the database you want to explore and the corresponding Metamodel components.
    3. SEMOSS will open a new visualization window. Click the Toggle Console.
    4. Select the Java option in the console window.
    5. Type the Java command you'd like to run and click "Submit" or Ctrl + Enter.

    Wrap-Up

    You made it!

    You ought to give yourself a pat on the back - we made it to the end of our quick start user guide! We hope that this guide has gotten you on the right path to using SEMOSS to dig deeper into your data.

    While this guide touches on visualizations and analytics on-the-fly, SEMOSS is conducive for much more comprehensive data analyses using saved databases and fully-built Metamodels. Don't limit yourself to our introduction to SEMOSS here - the possibilities in SEMOSS are endless!

    We'd love to hear how you're using SEMOSS for smarter data analysis. Share your work with us at semossinfo@gmail.com.

    Happy exploring!

    Appendix

    Metamodels 101


    Building a Metamodel

    When you upload new data, you may consider building a Metamodel to structure relationships within your data. A Metamodel is a set of constraints used to structure the data. Metamodels show what types of data exist in a given database and how different data relate to others. Metamodels can be viewed as a map that shows how one can navigate a given database, and are essentially structured depictions of how data is stored in a database.

    By defining relationships and properties of your data, SEMOSS allows you to run a context-aware analysis. This provides a richer analysis in contrast to examining each column of your data in isolation.

    Metamodels show Nodes (Independent) and Properties (Dependent). Properties, which are similar to characteristics, describe nodes.

    In an example database about movies depicted above, the Nodes are Title, Director, Studio, Genre, and Nominated. The node "Title" has several properties including MovieBudget and RevenueDomestic. While Title is independent (i.e. does not describe a node), properties such as MovieBudget are not meaningful unless the node they describe is also known.


    Creating a New Metamodel

    1. To begin uploading data from the Feed, click the button highlighted in the top right corner (depending on screen size, this button may be expanded to "Upload Data").
    2. Determine which type of file upload you will be performing.
    3. Give your database a name, and select how you will be adding the data:
      1. For "File Upload,"either click "Browse Files" and select your .XLSV or .CSV file, or drag and drop the file into the box provided. Choosing "File Upload" to create a new database will allow you to determine the database type, build the database Metamodel, see an auto-visualization, and browse advanced settings.
    4. For "Scrape a URL," provide a link to the webpage in the box provided. Uploading data from the "Scrape a URL" or "Upload with Text" will populate your database into the Feed screen.
    5. For "Upload with Text," input the text into the box provided and click upload. The database will be created and placed into the Feed on
    6. Choose the method to build your database model, selecting between "Build as a Flat Table", "Build a Suggested Metamodel", "Create your own Metamodel", or "Select Prop Files".
    7. Click the "Advanced Settings" option under the Upload section to designate the base URI, select custom or Map files, and customize question.

    Next, you can structure your data as a Metamodel to get a more nuanced look at your dataset.


    Structuring a Metamodel

    The Metamodel allows you to see the nodes and relationships between dimensions in the database. "View DB Metamodel" shows the entire database, while "View Metamodel" displays only the selected portions of the database.

    Dimensions can be selected by clicking on respective nodes or by clicking the plus sign ("+") by each dimension's name in the left hand menu. Nodes must connect to data already selected, so complex Metamodels may not allow you to click a dimension without first selecting nodes creating the path between dimensions. Once the desired dimensions are selected, click the "Visualize" button.

    When you create a visualization or run an analytics routine on SEMOSS, you will be prompted to select components of your Metamodel. This will be the data SEMOSS draws from to create your visualization or analytics routine.

    Metamodels can only be edited at the "Upload" stage, so carefully think through your data structure.


    Option 1: "Build as a Flat Table"

    1. If you choose to build as a flat table, select "Build as Flat Table"in the dropdown box. Click the "Next" button to see the Data Selection page).
    2. In Data Selection, select which datatype each dimension is written on. Click the corresponding type to ensure the dimension matches its type.
      1. A small yellow exaclamation point indicates that SEMOSS edited the column header name to make it cleaner and more accessible.
      2. Click the "X" button to hide the column completely.
    3. Click the "Load" button to begin working with data.


    Option 2: "Create your Own Metamodel"

    When building your database model, another option for structuring your database is to "Create your own Metamodel."


    1. After selecting this option from the list, click the "Metamodel Creator" button to go to the next page.
    2. Add the dimensions you wish to include in your Metamodel. In the left hand menu, click "Add" to bring the dimension into the workspace.
      1. If you decide a dimension is no longer wanted, click the red "X" within the box of the dimension. If you decide there are dimensions that you now want to include, you can "Add" a dimension at any point until uploading the Metamodel.
      Once the dimensions are selected, you can begin defining properties and relationships between the dimensions.
    3. To add a property, first click on the black square on the end of the dimension that is the property
      1. An arrow will appear and follow your cursor.
      2. If the property is no longer wanted, click the red "X" by the property.
    4. After the Properties are placed under their Subjects, click on the "Edit" button corresponding to any of the properties. A new box (shown below) will appear to define the data types of each property.
    5. Select if the property is a string, numeric, or date.
    6. After completing the selection for each property, click the "Accept" button.
    7. To create relationships between Subjects, click on the black square to bring out the line and arrow from the box
    8. Move the cursor to the black box of the other Subject in the Relationship.
      1. If the relationship is no longer wanted, click the red "X" between the Subjects.
    9. After creating all desired relationships, click the "Upload" button to upload the database. After uploading, the database should appear on your Homepage as shown below.


    Option 3: "Build Suggested Metamodel"

    If you choose to build your database through "Build Suggested Metamodel,", you have the option to either "View Metamodel" or "Upload." Viewing the Metamodel allows you to see the automated Metamodel and make edits to the dimensions included, properties, relationships and property types.


    1. Follow the steps outlined in the aforementioned section on creating a new Metamodel.

    2. After making the desired changes, click the "Upload" button to upload the database.

    3. After uploading, the database should appear on your Homepage.



    Option 4: Select Prop File

    The final option for building your database Metamodel is "Select Prop Files." Prop files allow you to take your previously created Metamodel and port it onto your new set of data, saving you from having to redefine all of your relationships and properties if you've created a particularly large database that needs to be updated.


    1. After selecting the desired file, select either "View Metamodel" or "Upload." Viewing the Metamodel allows you to see the Metamodel you brought in with your prop file and make edits to the dimensions including properties, relationships, and property types.

    2. After making the desired changes, click the "Upload" button to upload the database.

    3. After uploading, the database should appear on your Feed.