Commit e1fb926d by Shubham Singh

Initial commit

0 parents
# Created by https://www.gitignore.io/api/linux,python,jupyternotebook
### JupyterNotebook ###
.ipynb_checkpoints
*/.ipynb_checkpoints/*
# Remove previous ipynb_checkpoints
# git rm -r .ipynb_checkpoints/
#
### Linux ###
*~
# temporary files which can be created if a process still has a handle open of a deleted file
.fuse_hidden*
# KDE directory preferences
.directory
# Linux trash folder which might appear on any partition or disk
.Trash-*
# .nfs files are created when an open file is removed but is still being accessed
.nfs*
### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
### Python Patch ###
.venv/
scenv/
*.json
resources/Sentimentanalysis_dataset/
resources/ego-facebook/
resources/authorship-attribution/
resources/*.txt
# End of https://www.gitignore.io/api/linux,python,jupyternotebook
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to Twitter\n",
"\n",
"## What is Twitter?\n",
"Twitter is a micro-blogging social network website, where users post 280 (previously 140) characters long messages called 'Tweets'.\n",
"\n",
"## How does a tweet look like?"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<blockquote class=\"twitter-tweet\" data-conversation=\"none\" data-lang=\"en\">\n",
"<p lang=\"en\" dir=\"ltr\">The perfect start for \n",
"<a href=\"https://twitter.com/Argentina?ref_src=twsrc%5Etfw\">\n",
"@Argentina</a> on a huge night of football. \n",
"<a href=\"https://twitter.com/hashtag/NGAARG?src=hash&amp;ref_src=twsrc%5Etfw\">\n",
"#NGAARG</a> 0-1 <a href=\"https://t.co/RbGhyMGBqk\">pic.twitter.com/RbGhyMGBqk</a>\n",
"</p>&mdash; FIFA World Cup 🏆 (@FIFAWorldCup) \n",
"<a href=\n",
"\"https://twitter.com/FIFAWorldCup/status/1011675817624129536?ref_src=twsrc%5Etfw\">\n",
"June 26, 2018</a>\n",
"</blockquote>\n",
"<script async src=\"https://platform.twitter.com/widgets.js\" charset=\"utf-8\"></script>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%html\n",
"<blockquote class=\"twitter-tweet\" data-conversation=\"none\" data-lang=\"en\">\n",
"<p lang=\"en\" dir=\"ltr\">The perfect start for \n",
"<a href=\"https://twitter.com/Argentina?ref_src=twsrc%5Etfw\">\n",
"@Argentina</a> on a huge night of football. \n",
"<a href=\"https://twitter.com/hashtag/NGAARG?src=hash&amp;ref_src=twsrc%5Etfw\">\n",
"#NGAARG</a> 0-1 <a href=\"https://t.co/RbGhyMGBqk\">pic.twitter.com/RbGhyMGBqk</a>\n",
"</p>&mdash; FIFA World Cup 🏆 (@FIFAWorldCup) \n",
"<a href=\n",
"\"https://twitter.com/FIFAWorldCup/status/1011675817624129536?ref_src=twsrc%5Etfw\">\n",
"June 26, 2018</a>\n",
"</blockquote>\n",
"<script async src=\"https://platform.twitter.com/widgets.js\" charset=\"utf-8\"></script>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## User actions on Twitter\n",
"\n",
"+ Tweet -- Post a message with image/video and text within 240 characters on Twitter.\n",
"+ Retweet -- Retweet or share a tweet made by another user within Twitter.\n",
"+ Reply -- Post a message in respose to another user's tweet.\n",
"+ Mentions -- Tag another user in his/her tweet or reply.\n",
"+ Hashtag -- Another tag used to link to a topic or event.\n",
"+ Follow -- Follow or subscribe to a user's tweets. A *Follower* is a user that follows, and the user that is being followed in *followee*.\n",
"+ Search -- To search for tweets posted by other accounts based on a query."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Twitter API\n",
"Twitter provides an application programming interface (API) [[1]](#ref1). The API allows us to interact with the social media in many ways, like get user tweets, message users, search for tweets etc. \n",
"\n",
"### How does one use the API?\n",
"To use the API to do any of the above mentioned actions, the user needs to create a Twitter Developer App and get the following keys:\n",
"+ Customer Key\n",
"+ Consumer Secret\n",
"+ Access Token\n",
"+ Access Token Secret\n",
"\n",
"These are necessary for the authentication process with the API.\n",
"\n",
"### How can I collect data from the API?\n",
"The API has various endpoints to perform various actions. We will primarily be focussing on Search and Streaming.\n",
"\n",
"### API rate limits\n",
"The Twitter API is rate limited in order to avoid the API hits hampering with the behaviour of the social network.\n",
"\n",
"### Libraries used\n",
"+ Tweepy [[2]](#ref2)\n",
"+ jsonpickle [[4]](#ref4)\n",
"\n",
"## Show me the code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Import library and initiate the API"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Authentication successfull!!! :D\n"
]
}
],
"source": [
"import tweepy\n",
"\n",
"#Consumer Key (API Key), Consumer Secret (API Secret)\n",
"auth = tweepy.OAuthHandler('<Consumer Key>', \n",
" '<Consumer Secret>')\n",
"# Access Token, Access Token Secret\n",
"auth.set_access_token('<Access Token>', \n",
" '<Access Token Secret>')\n",
"\n",
"api = tweepy.API(auth)\n",
"if (not api):\n",
" print(\"Authentication failed :(\")\n",
"else:\n",
" print(\"Authentication successfull!!! :D\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Search API\n",
"\n",
"#### Search Parameters [[3]](#ref3)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"query = '#worldcup' # this is what we're searching for\n",
"en_lang = 'en' # this is used to specify the language of the tweets\n",
"popular_results = 'popular' # used to specifiy the order of tweet results. Accepted values: popular|recent|mixec\n",
"extended_mode = 'extended' # used to tell the API not to truncate the tweet"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Query the endpoint"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"search_results = api.search(q=query, lang=en_lang, result_type=popular_results, \n",
" tweet_mode=extended_mode)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Print the result"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pprint import pprint\n",
"\n",
"print_till = 5\n",
"counter = 0\n",
"for tweet in search_results:\n",
" if counter < print_till:\n",
" pprint(tweet._json)\n",
" print('--------------------------------------------------------')\n",
" counter += 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save the tweets\n",
"#### Import the library and specify the file name"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import jsonpickle\n",
"\n",
"file_name = 'search_tweets.json'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Iterate through search results and save the tweet"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"with open(file_name, 'w') as f:\n",
" for tweet in search_results:\n",
" f.write(jsonpickle.encode(tweet._json, unpicklable=False) +\n",
" '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Streaming API [[5]](#ref5)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"#override tweepy.StreamListener to add logic to on_status\n",
"class MyStreamListener(tweepy.StreamListener):\n",
"\n",
" def on_status(self, status):\n",
" print(status.text)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"myStreamListener = MyStreamListener\n",
"myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"myStream.filter(track=['worldcup'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## References"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='ref1'></a>\n",
"[1] https://developer.twitter.com/content/developer-twitter/en.html"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='ref2'></a>\n",
"[2] https://github.com/tweepy/tweepy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='ref3'></a>\n",
"[3] https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='ref4'></a>\n",
"[4] https://jsonpickle.github.io/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='ref5'></a>\n",
"[5] https://developer.twitter.com/en/docs/tutorials/consuming-streaming-data.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
This diff could not be displayed because it is too large.
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import networkx as nx\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Download the following network: [\"http://snap.stanford.edu/data/egonets-Facebook.html\"](http://snap.stanford.edu/data/egonets-Facebook.html)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"FG = nx.read_edgelist(\"resources/facebook_combined.txt\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"nx.write_gexf(FG, \"facebook_combined.gexf\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Task 1: Visualize the Facebook Graph:\n",
"Use Gephi and customize the graph according to your personal preferences. Once you're satisfied, save the image in the PNG format and paste it in this notebook. \n",
"\n",
"What interesting observations are you able to make?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Task 2: Compute the following for the Facebook Graph:\n",
"1. Number of Nodes\n",
"2. Number of Edges\n",
"3. Average Number of Triangles\n",
"4. Average Clustering Coefficient\n",
"5. Average Degree Centrality\n",
"6. Average Betweenness Centrality\n",
"7. Average Preferential Attachment Score between nodes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Task 3: Comparing with A Random Graph:\n",
"1. Draw an [Erdos-Renyi Random Graph](https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.generators.random_graphs.erdos_renyi_graph.html#networkx.generators.random_graphs.erdos_renyi_graph) using NetworkX having the same number of nodes as the Facebook graph. Set the probability parameter p = 0.01\n",
"2. Compute the same 7 properties for this random graph that you computed for the Facebook graph. What observations can you draw from a comparative analysis? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Task 4 [if time permits/take home]: Draw Plots:\n",
"Draw the plots of the degree distributions of the Facebook and Random Graphs."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
# Social Computing Summer School at IIIT-H
The repository contains the code base for the lab sessions of [Social Computing Summer School](https://search.iiit.ac.in/socialcomputing2018) at IIIT-H, 2018.
## Running Instructions
All of the code runs on Python 3.
To run the notebook, first create a virtaul environment (if not created already) and activate it by running
```
$ virtualenv scenv
$ source scenv/bin/activate
```
Install the Python dependencies using _requirements.txt_ file. To do so, run:
```
$ pip install -r requirements.txt
```
After the virtual environment is activated, start the Jupyter Notebook server using:
```
$ jupyter notebook
```
\ No newline at end of file
backcall==0.1.0
bleach==2.1.3
boto==2.48.0
boto3==1.7.51
botocore==1.10.51
bz2file==0.98
certifi==2018.4.16
chardet==3.0.4
cycler==0.10.0
decorator==4.3.0
docutils==0.14
entrypoints==0.2.3
html5lib==1.0.1
idna==2.7
ipykernel==4.8.2
ipython==6.4.0
ipython-genutils==0.2.0
ipywidgets==7.2.1
jedi==0.12.1
Jinja2==2.10
jmespath==0.9.3
jsonpickle==0.9.6
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.3
jupyter-console==5.2.0
jupyter-core==4.4.0
kiwisolver==1.0.1
MarkupSafe==1.0
matplotlib==2.2.2
mistune==0.8.3
nbconvert==5.3.1
nbformat==4.4.0
networkx==2.1
notebook==5.5.0
numpy==1.14.5
oauthlib==2.1.0
pandas==0.23.2
pandocfilters==1.4.2
parso==0.3.0
pexpect==4.6.0
pickleshare==0.7.4
Pillow==5.2.0
prompt-toolkit==1.0.15
ptyprocess==0.6.0
Pygments==2.2.0
pyparsing==2.2.0
PySocks==1.6.8
python-dateutil==2.7.3
pytz==2018.5
pyzmq==17.0.0
qtconsole==4.3.1
requests==2.19.1
requests-oauthlib==1.0.0
s3transfer==0.1.13
Send2Trash==1.5.0
simplegeneric==0.8.1
six==1.11.0
smart-open==1.6.0
terminado==0.8.1
testpath==0.3.1
tornado==5.0.2
traitlets==4.3.2
tweepy==3.6.0
urllib3==1.23
wcwidth==0.1.7
webencodings==0.5.1
widgetsnbextension==3.2.1
wordcloud==1.4.1
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!