icon picker
Working with the data - **needs updating**

Once the data request form has been completed, the user will receive the directions to access the requested data subset. This will be stored in a path where only the user and his team will have access.

The path of the file

The path of the file will be provided to the researcher by the DS team. This way, the researcher will be able to access the data.
Need a standardised message that DS can use - can be English as they all speak it
As an example:
The data path is the following: s3://907743700548-mlops-eu-iese-privacy-safe/workspaces/demo/
Show new example
Name of the file: fintonic_demo_restricted.csv

All the following will need to change to reflect Jupyterhub
Action: James to set up Alfonso in Jupyterhub
Action: Alfonso to produce Jupyterhub help (as per below)
Action: James to complete the migration from Databrocks to jupyterhub

How to access the data


To access the data, you will need to follow the following steps:
1.- Start a cluster: make sure a cluster is already started
And check that the cluster is valid for the group or project where the file is allocated:
This can be done by clicking on the link of the cluster in the computing area and checking the Policy on the top of the description, click EDIT button and check the policy.
This case “Demo Cluster Policy”.

And start the cluster.
2.- Open a notebook or create a notebook:
Open a Notebook:
3.- Create a New Notebook:
4.- Complete the form:
remember that you can select a programming language from the Default Language drop list:
R
Python
SQL
Scala
5.- Do not forget to attach a cluster.
6.- Now you are ready to access the data
In Python: Some scripts
Install some of the libraries:
ENter: pip install fsspec in a box in the notebook,
Click run.
Create New box by clickint a + under the box,
Enter pip install s3fs
Click run.
6.- Now convert the CSV created for you and allocated in the folder shared with you into a dataframe you can work with in Python:
Script: df = spark.read.option('header', True).csv('s3://907743700548-mlops-eu-iese-privacy-safe/workspaces/demo/fintonic_demo_restricted.csv').toPandas() #blue is the full path or the file


7.- Print the data and access it from the platform
?Create a new box and type:
display(df) #where `df` is the name of the dataframe
And click RUN
The result is a table.
By clicking on the icon bottom left, it can be converted into a plot graphic
These Graphs can be stored in a Dashboard to be used by the user that will provide a good view of the data in order to play with it.
The dashboard can be shared and exported into an HTML file:

Some aids and scripts

Path: Notebooks - User - Shared - iese_data_Exchange_01_utils

LIBRARIES


Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.