BigW Consortium Gitlab
Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
X
xml-python-examples
Project
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
Registry
Registry
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Commits
Issue Boards
Open sidebar
Forest Godfrey
xml-python-examples
Commits
39d60294
Commit
39d60294
authored
Feb 20, 2019
by
Forest Godfrey
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Python post parser to take Tumblr XML and schedule it with Hootsuite
parent
eeea7342
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
44 additions
and
0 deletions
+44
-0
.gitignore
.gitignore
+1
-0
posts.py
posts.py
+43
-0
No files found.
.gitignore
0 → 100644
View file @
39d60294
*.pyc
posts.py
0 → 100644
View file @
39d60294
import
datetime
from
datetime
import
timedelta
import
xml.etree.ElementTree
as
ET
import
re
def
strip_html
(
text
):
return
re
.
sub
(
"<.*?>"
,
""
,
text
)
def
strip_commas
(
text
):
return
re
.
sub
(
","
,
""
,
text
)
def
remove_non_ascii
(
text
):
return
''
.
join
([
i
if
ord
(
i
)
<
128
else
' '
for
i
in
text
])
def
process_post
(
post
):
urlstring
=
""
urlwidth
=
0
for
url
in
post
.
findall
(
'photo-url'
):
try
:
if
int
(
url
.
get
(
'max-width'
))
>
urlwidth
:
urlwidth
=
int
(
url
.
get
(
'max-width'
))
urlstring
=
url
.
text
except
Exception
:
pass
caption
=
""
for
c
in
post
.
findall
(
'photo-caption'
):
caption
=
c
.
text
break
return
urlstring
,
remove_non_ascii
(
strip_commas
(
strip_html
(
caption
)))
tree
=
ET
.
parse
(
'/tmp/posts.xml'
)
root
=
tree
.
getroot
()
d
=
datetime
.
datetime
(
2019
,
2
,
21
,
17
,
0
)
t
=
timedelta
(
0
,
30
*
60
,
0
)
for
post
in
root
.
findall
(
'post'
):
post_time
=
d
.
strftime
(
"
%
Y/
%
m/
%
d
%
H:
%
M"
)
post_url
,
post_text
=
process_post
(
post
)
d
=
d
+
t
print
"
%
s,
%
s,
%
s"
%
(
post_time
,
post_text
,
post_url
)
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment