Husain Al-Mohssen

PhD, Data Scientist/Architect

Husain's main focus is in the area of engineering science and it's application to create profitable products that serve 10's or 100's of thousands of users. Husain's work at MIT focused on extracting faint signals from super-computer scale gas simulations close to equilibrium. He later built on this background to start a user-facing email analytics company that serviced thousands of paying customers in near-real time. Husain has extensive software engineering experience both as a developer, maintainer and architect in the areas of enterprise software, high performance computing, as well as the “Big Data” domain. Husain is also an accomplished mechanical engineer, with many years of hands-on experience in the oil and power industries.

Presentations

Sooner or later you probably will be working on a projects with words like “Data Science”, “Machine Learning”, “Big Data” attached to them. This talk will help you deliver these projects successfully by reviewing the key items you need to consider when incorporating analytical solutions into shipping products.

This talk will focus on answering these questions at scale:

  • What are the most Agile ways to develop and deploy data- science heavy workloads?
  • What is special about the computing environments doing data science or machine learning compared to non-science heavy workloads?
  • What should we focus on when designing data transport and routing options in data science applications? How can we control costs and improve performance when doing this work?
  • How do we setup test environments that verify the reliability of data science workloads?
    The solutions we will be talking about will be a combination of tried-and-true techniques as well as very non-intuitive suggestions that make no sense in regular contexts. By the end of the talk you should be able to reason though the different design trade-offs when creating a sane and reliable production system.

Most companies are dramatically increasing their ability to generate and collect large amounts of data and most of them are trying to use statistic and machine learning to get something from this data. Sooner or later you probably will be working on a project that will have a significant “Data Science” component to it or be expected to give input about building such system. This talk will introduce a number of machine learning techniques to solve data science questions using the R programming language.

This hand-on talk will be a crash-course into the fun world of Machine Learning (ML). The tutorial will both be a theoretical survey of common machine learning techniques as well a quick introduction to R and how to use it to solve some practical problems using ML. The talk will not make you an expert on machine learning tools but it will give you a solid feel for what can be done with them and help you understand what your data scientist can realistically do for your organization with off-the- libraries and the challenges she has to deal-with day-to-day. This is the first of two sessions. In this session we will cover

  • A brief introduction to R and the RStudio IDE • A taxonomy of ML algorithms
  • Linear regression from a ML perspective
  • Logistic regression
  • Support Vector Machines (SVM)
  • Curse of Dimensionality
  • Dimensionality Reduction: Principle Component Analysis

Humanity has dramatically increased it’s ability to generate and retain astronomically large amounts of data in the last decade. Sooner or later you probably will be working on a project that will have a significant “Data Science” component to it or at least be expected to give input about building such system. This is the second of a two-part introduction using machine learning techniques to solve data science questions using the R programming language.

This hand-on talk will be a crash-course into the fun world of Machine Learning (ML) aimed at absolute beginners. The tutorial will both be a theoretical survey of common machine learning techniques as well a quick introduction to R and how to use it to solve some practical problems using ML. The talk will not make you an expert on machine learning tools but it will give you a solid feel for what can be done with them and help you understand what your data scientist can realistically do for your organization with off-the-libraries and the challenges she has to deal-with day-to-day.
This is the second of two sessions. In this session we will cover

  • A brief introduction to R and the RStudio IDE
  • Neural Networks
  • Auto-Encoders
  • K-Means Clustering
  • Graphical Models
  • Hidden-Markov Models
  • Feature Generation
  • Variance-Bias Trade-Off and Model Selection